summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [ARM GlobalISel] Move the check for Thumb higher upDiana Picus2017-11-031-6/+6
| | | | | | | | | We're currently bailing out for Thumb targets while lowering formal parameters, but there used to be some other checks before it, which could've caused some functions (e.g. those without formal parameters) to sneak through unnoticed. llvm-svn: 317312
* [AArch64] Use dwarf exception handling on MinGWMartin Storsjo2017-11-032-1/+11
| | | | | | | | | | Ideally we should probably produce WinEH here as well, but until then, we can use dwarf exceptions, without any further changes required in clang, libunwind or libcxxabi. Differential Revision: https://reviews.llvm.org/D39535 llvm-svn: 317304
* [X86] Remove PALIGNR/VALIGN handling from combineBitcastForMaskedOp and move ↵Craig Topper2017-11-032-22/+131
| | | | | | to isel patterns instead. Prefer 128-bit VALIGND/VALIGNQ over PALIGNR during lowering when possible. llvm-svn: 317299
* Avoid PLT for external calls when attribute nonlazybind is used.Sriraman Tallam2017-11-031-2/+9
| | | | | | Differential Revision: https://reviews.llvm.org/D39065 llvm-svn: 317292
* [AArch64][RegisterBankInfo] Add mapping for G_FPEXT.Quentin Colombet2017-11-023-1/+89
| | | | | | | | | | This fixes http://llvm.org/PR32560. We were missing a description for half floating point type and as a result were using the FPR 32 mapping. Because of the size mismatch the generic code was complaining that the default mapping is not appropriate. Fix the mapping description so that the default mapping can be properly applied. llvm-svn: 317287
* [AArch64][RegisterBankInfo] Add FPR16 support in value mapping.Quentin Colombet2017-11-023-35/+48
| | | | | | NFC. llvm-svn: 317286
* [X86] Give AVX512VL instructions priority over their AVX equivalents.Craig Topper2017-11-021-2/+2
| | | | | | I thought we had gotten all these priority bugs worked out, but I guess not. llvm-svn: 317283
* AMDGPU: Fix warning discovered by r317266 [-Wunused-private-field]Konstantin Zhuravlyov2017-11-021-1/+0
| | | | llvm-svn: 317280
* [Hexagon] Prefer L2_loadrub_io over L4_loadrub_rrKrzysztof Parzyszek2017-11-021-52/+82
| | | | | | | If the offset is an immediate, avoid putting it in a register to get Rs+Rt<<#0. llvm-svn: 317275
* AMDGPU: Remove outdated fixme (it was already fixed)Konstantin Zhuravlyov2017-11-021-3/+0
| | | | llvm-svn: 317266
* [mips] Use register scavenging with MSA.Simon Dardis2017-11-022-24/+19
| | | | | | | | | | | | | | | | | MSA stores and loads to the stack are more likely to require an emergency GPR spill slot due to the smaller offsets available with those instructions. Handle this by overestimating the size of the stack by determining the largest offset presuming that all callee save registers are spilled and accounting of incoming arguments when determining whether an emergency spill slot is required. Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D39056 llvm-svn: 317204
* [ARM] and, or, xor and add with shl combineSam Parker2017-11-021-7/+120
| | | | | | | | | | | | | | | | | | | The generic dag combiner will fold: (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2) (shl (or x, c1), c2) -> (or (shl x, c2), c1 << c2) This can create constants which are too large to use as an immediate. Many ALU operations are also able of performing the shl, so we can unfold the transformation to prevent a mov imm instruction from being generated. Other patterns, such as b + ((a << 1) | 510), can also be simplified in the same manner. Differential Revision: https://reviews.llvm.org/D38084 llvm-svn: 317197
* The patch updates sched numbers for YMM AVX instrs such as VMOVx, VORx, ↵Andrew V. Tischenko2017-11-021-0/+93
| | | | | | | | | VXOR, VPERMILx, VBROADCASTx, etc. PR32857 should be closed. Differential Revision: https://reviews.llvm.org/D39227 llvm-svn: 317196
* Revert "Correct dwarf unwind information in function epilogue for X86"Petar Jovanovic2017-11-013-54/+0
| | | | | | | This reverts r317100 as it introduced sanitizer-x86_64-linux-autoconf buildbot failure (build #15606). llvm-svn: 317136
* [X86] Use foreach in X86.td to combine some of the CPU names that are ↵Craig Topper2017-11-011-52/+40
| | | | | | obviously aliases. NFC llvm-svn: 317134
* [X86] Add CMOV feature to 'i686' processor, making it a proper alias of ↵Craig Topper2017-11-011-1/+1
| | | | | | | | pentiumpro which I believe it should be. This is consistent with current gcc behavior. llvm-svn: 317133
* [X86][SSE] Add PACKUS support to LowerTruncateSimon Pilgrim2017-11-011-12/+26
| | | | | | | | Similar to the existing code to lower to PACKSS, we can use PACKUS if the input vector's leading zero bits extend all the way to the packed/truncated value. We have to account for pre-SSE41 targets not supporting PACKUSDW llvm-svn: 317128
* [X86] Add custom code to EVEX to VEX pass to turn unmasked 128-bit ↵Craig Topper2017-11-011-0/+21
| | | | | | | | | | VPALIGND/Q into VPALIGNR if the extended registers aren't being used. This will enable us to prefer VALIGND/Q during shuffle lowering in order to get the extended register encoding space when BWI isn't available. But if we end up not using the extended registers we can switch VPALIGNR for the shorter VEX encoding. Differential Revision: https://reviews.llvm.org/D39401 llvm-svn: 317122
* AMDGPU: Fix set but not used warnings related to AMDGPUASKonstantin Zhuravlyov2017-11-017-36/+32
| | | | | | Differential Revision: https://reviews.llvm.org/D39499 llvm-svn: 317114
* [X86] Prevent fast isel from folding loads into the instructions listed in ↵Craig Topper2017-11-011-0/+7
| | | | | | | | | | | | | | hasPartialRegUpdate. This patch moves the check for opt size and hasPartialRegUpdate into the lower level implementation of foldMemoryOperandImpl to catch the entry point that fast isel uses. We're still folding undef register instructions in AVX that we should also probably disable, but that's a problem for another patch. Unfortunately, this requires reordering a bunch of functions which is why the diff is so large. I can do the function reordering separately if we want. Differential Revision: https://reviews.llvm.org/D39402 llvm-svn: 317112
* Adds code to PPC ISEL lowering to recognize half-word inserts from ↵Graham Yiu2017-11-013-5/+139
| | | | | | | | vector_shuffles, and use P9 shift and vector insert instructions instead of vperm. Differential Revision: https://reviews.llvm.org/D34160 llvm-svn: 317111
* [X86] Add 64-bit int to float/double conversion with AVX to ↵Craig Topper2017-11-011-3/+4
| | | | | | | | | | | | | | | | | | | | | X86FastISel::X86SelectSIToFP Summary: [X86] Teach fast isel to handle i64 sitofp with AVX. For some reason we only handled i32 sitofp with AVX. But with SSE only we support i64 so we should do the same with AVX. Also add i686 command lines for the 32-bit tests. 64-bit tests are in a separate file to avoid a fast-isel abort failure in 32-bit mode. Reviewers: RKSimon, zvi Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39450 llvm-svn: 317102
* Update VCVTx, VMOVNTPx and VROUNDYPx instructions scheduling on btver2.Andrew V. Tischenko2017-11-011-0/+39
| | | | | | Differential Revision: https://reviews.llvm.org/D39059 llvm-svn: 317101
* Correct dwarf unwind information in function epilogue for X86Petar Jovanovic2017-11-013-0/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch aims to provide correct dwarf unwind information in function epilogue for X86. It consists of two parts. The first part inserts CFI instructions that set appropriate cfa offset and cfa register in emitEpilogue() in X86FrameLowering. This part is X86 specific. The second part is platform independent and ensures that: - CFI instructions do not affect code generation - Unwind information remains correct when a function is modified by different passes. This is done in a late pass by analyzing information about cfa offset and cfa register in BBs and inserting additional CFI directives where necessary. Changed CFI instructions so that they: - are duplicable - are not counted as instructions when tail duplicating or tail merging - can be compared as equal Added CFIInstrInserter pass: - analyzes each basic block to determine cfa offset and register valid at its entry and exit - verifies that outgoing cfa offset and register of predecessor blocks match incoming values of their successors - inserts additional CFI directives at basic block beginning to correct the rule for calculating CFA Having CFI instructions in function epilogue can cause incorrect CFA calculation rule for some basic blocks. This can happen if, due to basic block reordering, or the existence of multiple epilogue blocks, some of the blocks have wrong cfa offset and register values set by the epilogue block above them. CFIInstrInserter is currently run only on X86, but can be used by any target that implements support for adding CFI instructions in epilogue. Patch by Violeta Vukobrat. Differential Revision: https://reviews.llvm.org/D35844 llvm-svn: 317100
* [X86][SSE] Begun generalizing truncateVectorWithPACKSS to work with ↵Simon Pilgrim2017-11-011-11/+14
| | | | | | | | PACKSS/PACKUS functions Renamed to truncateVectorWithPACK llvm-svn: 317098
* Revert r313618 "[ARM] Use ADDCARRY / SUBCARRY"Roger Ferrer Ibanez2017-11-012-168/+20
| | | | | | That change causes PR35103, so reverting until I figure it out. llvm-svn: 317092
* Fix warnings discovered by rL317076. [-Wunused-private-field]NAKAMURA Takumi2017-11-013-5/+1
| | | | llvm-svn: 317091
* Suppress a warning discovered by rL317076. [-Wunused-private-field]NAKAMURA Takumi2017-11-011-0/+1
| | | | llvm-svn: 317090
* [X86][SSE] Truncate with PACKSS any input with sufficient sign-bitsSimon Pilgrim2017-11-011-9/+12
| | | | | | | | | | So far we've only been using PACKSS truncations with 'all-bits or zero-bits' patterns (vector comparison results etc.). When really we can safely use it for any case as long as the number of sign bits reach down to the last 16-bits (or 8-bits if we're truncating to bytes). The next steps after this is add the equivalent support for PACKUS and to support packing to sub-128 bit vectors for truncating stores etc. Differential Revision: https://reviews.llvm.org/D39476 llvm-svn: 317086
* [X86] Add more type qualifiers to INSERT_SUBREG operations in rotate ↵Craig Topper2017-11-011-8/+8
| | | | | | | | | | patterns so they don't get created with a v64i8 type. Not sure why tablegen didn't error on this. Fixes PR35158. llvm-svn: 317079
* [X86] Add AVX512 support to X86FastISel::fastMaterializeFloatZero.Craig Topper2017-11-011-4/+5
| | | | llvm-svn: 317059
* [AMDGPU] Clean up symbols in the global namespace.Benjamin Kramer2017-10-314-57/+41
| | | | llvm-svn: 317051
* AMDGPU: Select s_buffer_load_dword with a non-constant SGPR offsetMarek Olsak2017-10-314-17/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Apps that benefit: - alien isolation - bioshock infinite - civilization: beyond earth - company of heroes 2 - dirt showdown - dota 2 - F1 2015 - grid autosport - hitman - legend of grimrock - serious sam 3: bfe - shadow warrior - talos principle - total war: warhammer - UE4 demos: effects cave, elemental, sun temple Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D38914 llvm-svn: 317038
* [X86][AsmParser] Treat '%' as the modulo operator under Intel syntaxReid Kleckner2017-10-311-0/+1
| | | | | | | | | | It can't be a register prefix, anyway. This is consistent with the masm docs on MSDN: https://msdn.microsoft.com/en-us/library/t4ax90d2.aspx This is a straight-forward extension of our support for "MOD" implemented in https://reviews.llvm.org/D33876 / r306425 llvm-svn: 317011
* [X86][SSE] Add VSRLI/VSRAI/VSLLI demanded elts support to ↵Simon Pilgrim2017-10-311-5/+6
| | | | | | | | computeKnownBits/ComputeNumSignBits Mainly a perf improvements as most combines will have occurred before we lower to these instructions llvm-svn: 317005
* [AVX512] Adding new patterns for extract_subvector of vXi1Michael Zuckerman2017-10-311-14/+42
| | | | | | | | | | | | | | | | | extract subvector of vXi1 from vYi1 is poorly supported by LLVM and most of the time end with an assertion. This patch fixes this issue by adding new patterns to the TD file. Reviewers: 1. guyblank 2. igorb 3. zvi 4. ayman 5. craig.topper Differential Revision: https://reviews.llvm.org/D39292 Change-Id: Ideb4d7e946c8d40cfce2920891f2d89fe64c58f8 llvm-svn: 316981
* [X86] Make AVX512_512_SET0 XMM16-31 lower to 128-bit XOR when AVX512VL is ↵Craig Topper2017-10-311-13/+2
| | | | | | | | | | enabled. Use 128-bit VLX instruction when VLX is enabled. Unfortunately, this weakens our ability to do domain fixing when AVX512DQ is not enabled, but it is consistent with our 256-bit behavior. Maybe we should add custom handling to domain fixing to allow EVEX integer XOR/AND/OR/ANDN to switch to VEX encoded fp instructions if the high registers aren't being used? llvm-svn: 316978
* [X86] Clang-format some code. NFCCraig Topper2017-10-311-2/+8
| | | | llvm-svn: 316973
* [AArch64]: range loopify frame-loweringJaved Absar2017-10-301-2/+2
| | | | llvm-svn: 316960
* [X86] Add AVX512 support to fast isel's X86ChooseCmpOpcode.Craig Topper2017-10-301-2/+3
| | | | llvm-svn: 316955
* Revert "[PowerPC] Try to simplify a Swap if it feeds a Splat"Stefan Pintilie2017-10-301-47/+0
| | | | | | | | | | Revert r316478. A test case has failed. Will recommit this change once we find and fix the failure. This reverts commit 7c330fabaedaba3d02c58bc3cc1198896c895f34. llvm-svn: 316952
* [X86][AVX512] Adding a pattern for broadcastm intrinsic.Jina Nahias2017-10-301-0/+64
| | | | | | | Differential Revision: https://reviews.llvm.org/D38312 Change-Id: I71c8605a8e4c98013ef25289694afc5cfd46bb0b llvm-svn: 316921
* Move isDSOLocal check and add a comment.Rafael Espindola2017-10-301-2/+12
| | | | llvm-svn: 316920
* [PPC CodeGen] Fix the bitreverse.i64 intrinsic.Fangrui Song2017-10-301-71/+34
| | | | | | | | | | | | Summary: The two 32-bit words were swapped. Update a test omitted in reverted r316270. Reviewers: jtony, aaron.ballman Subscribers: nemanjai, kbarton Differential Revision: https://reviews.llvm.org/D39163 llvm-svn: 316916
* [X86] Make sure we don't create locked inc/dec instructions when the carry ↵Craig Topper2017-10-304-19/+84
| | | | | | | | | | | | | | | | | | | | | | | flag is being used. Summary: INC/DEC don't update the carry flag so we need to make sure we don't try to use it. This patch introduces new X86ISD opcodes for locked INC/DEC. Teaches lowerAtomicArithWithLOCK to emit these nodes if INC/DEC is not slow or the function is being optimized for size. An additional flag is added that allows the INC/DEC to be disabled if the caller determines that the carry flag is being requested. The test_sub_1_cmp_1_setcc_ugt test is currently showing this bug. The other test case changes are recovering cases that were regressed in r316860. This should fully fix PR35068 finishing the fix started in r316860. Reviewers: RKSimon, zvi, spatel Reviewed By: zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39411 llvm-svn: 316913
* [X86] Remove AVX512 early out from X86FastISel::X86SelectCmp.Craig Topper2017-10-301-3/+0
| | | | | | This shouldn't be needed anymore since i1 isn't a legal type. llvm-svn: 316912
* [AMDGPU] Emit metadata for hidden arguments for kernel enqueueYaxun Liu2017-10-302-4/+51
| | | | | | | | | | | | | Identifies kernels which performs device side kernel enqueues and emit metadata for the associated hidden kernel arguments. Such kernels are marked with calls-enqueue-kernel function attribute by AMDGPUOpenCLEnqueueKernelLowering pass and later on hidden kernel arguments metadata HiddenDefaultQueue and HiddenCompletionAction are emitted for them. Differential Revision: https://reviews.llvm.org/D39255 llvm-svn: 316907
* [CodeGen][ExpandMemcmp] Allow memcmp to expand to vector loads (2).Clement Courbet2017-10-304-9/+44
| | | | | | | | | | | | - Targets that want to support memcmp expansions now return the list of supported load sizes. - Expansion codegen does not assume that all power-of-two load sizes smaller than the max load size are valid. For examples, this is not the case for x86(32bit)+sse2. Fixes PR34887. llvm-svn: 316905
* [Hexagon] Allow the RDF optimizations to be run in .mir testcasesKrzysztof Parzyszek2017-10-302-5/+7
| | | | llvm-svn: 316904
* [GlobalISel|ARM] : Allow legalizing G_FSUBJaved Absar2017-10-302-8/+9
| | | | | | | | Adding support for VSUB. Reviewed by: @rovka Differential Revision: https://reviews.llvm.org/D39261 llvm-svn: 316902
OpenPOWER on IntegriCloud