summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [AArch64] Update the feature set for Qualcomm's Falkor CPU.Chad Rosier2017-01-041-1/+5
| | | | llvm-svn: 291010
* [AArch64] Fix over-eager early-exit in load-store combinerNirav Dave2017-01-041-0/+3
| | | | | | | | | | | | | Fix early-exit analysis for memory operation pairing when operations are not emitted in ascending order. Reviewers: mcrosier, t.p.northover Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D28251 llvm-svn: 291008
* [PowerPC] Fix logic dealing with nop after calls (and tail-call eligibility)Hal Finkel2017-01-041-40/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change aims to unify and correct our logic for when we need to allow for the possibility of the linker adding a TOC restoration instruction after a call. This comes up in two contexts: 1. When determining tail-call eligibility. If we make a tail call (i.e. directly branch to a function) then there is no place for the linker to add a TOC restoration. 2. When determining when we need to add a nop instruction after a call. Likewise, if there is a possibility that the linker might need to add a TOC restoration after a call, then we need to put a nop after the call (the bl instruction). First problem: We were using similar, but different, logic to decide (1) and (2). This is just wrong. Both the resideInSameModule function (used when determining tail-call eligibility) and the isLocalCall function (used when deciding if the post-call nop is needed) were supposed to be determining the same underlying fact (i.e. might a TOC restoration be needed after the call). The same logic should be used in both places. Second problem: The logic in both places was wrong. We only know that two functions will share the same TOC when both functions come from the same section of the same object. Otherwise the linker might cause the functions to use different TOC base addresses (unless the multi-TOC linker option is disabled, in which case only shared-library boundaries are relevant). There are a number of factors that can cause functions to be placed in different sections or come from different objects (-ffunction-sections, explicitly-specified section names, COMDAT, weak linkage, etc.). All of these need to be checked. The existing logic only checked properties of the callee, but the properties of the caller must also be checked (for example, calling from a function in a COMDAT section means calling between sections). There was a conceptual error in the resideInSameModule function in that it allowed tail calls to functions with weak linkage and protected/hidden visibility. While protected/hidden visibility does prevent the function implementation from being replaced at runtime (via interposition), it does not prevent the linker from using an alternate implementation at link time (i.e. using some strong definition to replace the provided weak one during linking). If this happens, then we're still potentially looking at a required TOC restoration upon return. Otherwise, in general, the post-call nop is needed wherever ELF interposition needs to be supported. We don't currently support ELF interposition at the IR level (see http://lists.llvm.org/pipermail/llvm-dev/2016-November/107625.html for more information), and I don't think we should try to make it appear to work in the backend in spite of that fact. Unfortunately, because of the way that the ABI works, we need to generate code as if we supported interposition whenever the linker might insert stubs for the purpose of supporting it. Differential Revision: https://reviews.llvm.org/D27231 llvm-svn: 291003
* Remove dead and unused variable NumSentinelElements.Eric Christopher2017-01-041-2/+2
| | | | | | Fixes PR31529. llvm-svn: 290998
* AMDGPU/SI: Implement sendmsghalt intrinsicJan Vesely2017-01-046-4/+21
| | | | | | | | v2: expose using amdgcn prefix Differential Revision: https://reviews.llvm.org/D23511 llvm-svn: 290977
* [CostModel][X86] Updated vXi8 and vXi16 Reverse/Alternate shuffle costsSimon Pilgrim2017-01-041-11/+9
| | | | | | Actual codegen is much better than the extract+insert patterns that was assumed. llvm-svn: 290962
* [X86] Merged Reverse/Alternate shuffle cost tables. NFCI.Simon Pilgrim2017-01-041-141/+81
| | | | | | As discussed on D27811, merged the shuffle cost LUTs and use the shuffle kind to perform the lookup instead of the ISD opcode. llvm-svn: 290956
* [framelowering] Skip dbg values when getting next/previous instruction.Florian Hahn2017-01-041-8/+14
| | | | | | | | | | | | | | | | | | | Summary: In mergeSPUpdates, debug values need to be ignored when getting the previous element, otherwise debug data could have an impact on codegen. In eliminateCallFramePseudoInstr, debug values after the erased element could have an impact on codegen and should be skipped. Closes PR31319 (https://llvm.org/bugs/show_bug.cgi?id=31319) Reviewers: aprantl, MatzeB, mkuper Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D27688 llvm-svn: 290955
* [LLC][MIPS] Fix crash after enabling LLVM_ENABLE_EXPENSIVE_CHECKSNitesh Jain2017-01-042-0/+8
| | | | | | | | | Reviewers: sdardis, vkalintiris Subscribers: jaydeep, slthakur, RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D27841 llvm-svn: 290949
* [X86][AVX512] Passing the appropriate memory operand class to ↵Ayman Musa2017-01-042-26/+43
| | | | | | | | | | INT_{U}COMIS{S|D} instructions Replacing the memory operand in the intrinsic versions of the comis/ucomis instrucions from f128mem to ssmem/sdmem accordingly. Differential Revision: https://reviews.llvm.org/D28138 llvm-svn: 290948
* [X86] Attempt to pre-truncate arithmetic operations if usefulSimon Pilgrim2017-01-041-0/+81
| | | | | | | | | | | | | | In some cases its more efficient to combine TRUNC( BINOP( X, Y ) ) --> BINOP( TRUNC( X ), TRUNC( Y ) ) if the binop is legal for the truncated types. This is true for vector integer multiplication (especially vXi64), as well as ADD/AND/XOR/OR in cases where we only need to truncate one of the inputs at runtime (e.g. a duplicated input or an one use constant we can fold). Further work could be done here - scalar cases (especially i64) could often benefit (if we avoid partial registers etc.), other opcodes, and better analysis of when truncating the inputs reduces costs. I have considered implementing this for all targets within the DAGCombiner but wasn't sure we could devise a suitable cost model system that would give us the range we need. Differential Revision: https://reviews.llvm.org/D28219 llvm-svn: 290947
* [AVX-512] Add support for detecting 512-bit shuffles that contain a 128-bit ↵Craig Topper2017-01-041-3/+33
| | | | | | | | subvector insertion from the lowest subvector of one of the sources. These are best handled with a vinsert32x4 or vinsert64x2 instruction. llvm-svn: 290946
* [AVX-512] Simplify code for creating 512-bit SHUF128 operations.Craig Topper2017-01-041-18/+11
| | | | | | We don't need two loops and we can safely assume assume and hardcode the size of the widened mask. llvm-svn: 290942
* [Hexagon, TableGen] Fix some Clang-tidy modernize and Include What You Use ↵Eugene Zelenko2017-01-0412-385/+301
| | | | | | warnings; other minor fixes (NFC). llvm-svn: 290925
* [X86] Move 128-bit shuffle mask widening check into lowerV2X128VectorShuffle ↵Craig Topper2017-01-031-22/+17
| | | | | | to reduce code duplication. Use the now available widened mask to simplify some code inside lowerV2X128VectorShuffle. llvm-svn: 290872
* [AVX-512] Simplify the code added in r290870 to recognized 256-bit subvector ↵Craig Topper2017-01-031-30/+7
| | | | | | inserts and avoid calling isShuffleEquivalent on a widened mask. llvm-svn: 290871
* [AVX-512] Teach shuffle lowering to use vinsert instructions for shuffles ↵Craig Topper2017-01-031-0/+39
| | | | | | corresponding to 256-bit subvector inserts. llvm-svn: 290870
* [AVX-512] Teach EVEX to VEX conversion pass to handle VINSERT and VEXTRACT ↵Craig Topper2017-01-031-0/+16
| | | | | | instructions. llvm-svn: 290869
* [X86] Remove trailing whitespace and an unnecessary line wrap. NFCCraig Topper2017-01-031-37/+35
| | | | llvm-svn: 290867
* [X86] Fix header comment. NFCCraig Topper2017-01-031-1/+1
| | | | llvm-svn: 290866
* [AVX-512] Add support for pushing bitcasts through INSERT_SUBVEC in order to ↵Craig Topper2017-01-031-0/+23
| | | | | | select a masked operation. llvm-svn: 290865
* [AVX-512] Remove vinsert intrinsics and autoupgrade to native ↵Craig Topper2017-01-032-34/+2
| | | | | | shufflevectors. There are some codegen problems here that I'll try to fix in future commits. llvm-svn: 290864
* [AVX-512] Remove vextract intrinsics and autoupgrade to native ↵Craig Topper2017-01-031-27/+0
| | | | | | | | shufflevectors. This unfortunately generates some really terrible code without VLX support due to v2i1 and v4i1 not being legal. Hopefully we can improve that in future patches. llvm-svn: 290863
* [XRay] Merge instrumentation point table emission code into AsmPrinter.Dean Michael Berris2017-01-036-150/+2
| | | | | | | | | | | | | | | | | | Summary: No need to have this per-architecture. While there, unify 32-bit ARM's behaviour with what changed elsewhere and start function names lowercase as per the coding standards. Individual entry emission code goes to the entry's own class. Fully tested on amd64, cross-builds on both ARMs and PowerPC. Reviewers: dberris Subscribers: aemerson, llvm-commits Differential Revision: https://reviews.llvm.org/D28209 llvm-svn: 290858
* Fixed shuffle-reverse cost on AVX-512.Elena Demikhovsky2017-01-021-0/+1
| | | | | | (This changed was approved in https://reviews.llvm.org/D28118, but Simon asked to submit it separately). llvm-svn: 290812
* AVX-512 Loop Vectorizer: Cost calculation for interleave load/store patterns.Elena Demikhovsky2017-01-022-9/+250
| | | | | | | | | | | | X86 target does not provide any target specific cost calculation for interleave patterns.It uses the common target-independent calculation, which gives very high numbers. As a result, the scalar version is chosen in many cases. The situation on AVX-512 is even worse, since we have 3-src shuffles that significantly reduce the cost. In this patch I calculate the cost on AVX-512. It will allow to compare interleave pattern with gather/scatter and choose a better solution (PR31426). * Shiffle-broadcast cost will be changed in Simon's upcoming patch. Differential Revision: https://reviews.llvm.org/D28118 llvm-svn: 290810
* [AVR] Optimize 16-bit ANDs with '1'Dylan McKay2016-12-311-0/+4
| | | | | | | | | | | | Summary: Fixes PR 31345 Reviewers: dylanmckay Subscribers: fhahn, llvm-commits Differential Revision: https://reviews.llvm.org/D28186 llvm-svn: 290778
* Caught a simple typo. I do not know of a way to test this, but it seems like ↵Aaron Ballman2016-12-301-1/+1
| | | | | | an unlikely thing to regress in the future. llvm-svn: 290757
* [AVR] Optimize 16-bit ORs with '0'Dylan McKay2016-12-301-12/+27
| | | | | | | | | | | | | | Summary: Fixes PR 31344 Authored by Anmol P. Paralkar Reviewers: dylanmckay Subscribers: fhahn, llvm-commits Differential Revision: https://reviews.llvm.org/D28121 llvm-svn: 290732
* Revert "[COFF] Use 32-bit jump table entries in .rdata for Win64"Reid Kleckner2016-12-292-26/+0
| | | | | | | | This reverts commit r290694. It broke sanitizer tests on Win64. I'll probably bring this back, but the jump tables will just live in .text like they do for MSVC. llvm-svn: 290714
* [AMDGPU][mc] Enable absolute expressions in .hsa_code_object_isa directiveArtem Tamazov2016-12-291-12/+17
| | | | | | | | | | | Among other stuff, this allows to use predefined .option.machine_version_major /minor/stepping symbols in the directive. Relevant test expanded at once (also file renamed for clarity). Differential Revision: https://reviews.llvm.org/D28140 llvm-svn: 290710
* [COFF] Use 32-bit jump table entries in .rdata for Win64Reid Kleckner2016-12-292-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: We were already using 32-bit jump table entries, but this was a consequence of the default PIC model on Win64, and not an intentional design decision. This patch ensures that we always use 32-bit label difference jump table entries on Win64 regardless of the PIC model. This is a good idea because it saves executable size and object file size. Moving the jump tables to .rdata cleans up the disassembled object code and reduces the available ROP targets, but it requires adding one more RIP-relative lea to the code. COFF doesn't have relocations to express the difference between two arbitrary symbols, so we can't use the jump table label in the label difference like we do elsewhere. Fixes PR31488 Reviewers: majnemer, compnerd Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28141 llvm-svn: 290694
* This is a large patch for X86 AVX-512 of an optimization for reducing code ↵Gadi Haber2016-12-287-0/+1384
| | | | | | | | | | | | size by encoding EVEX AVX-512 instructions using the shorter VEX encoding when possible. There are cases of AVX-512 instructions that have two possible encodings. This is the case with instructions that use vector registers with low indexes of 0 - 15 and do not use the zmm registers or the mask k registers. The EVEX encoding prefix requires 4 bytes whereas the VEX prefix can take only up to 3 bytes. Consequently, using the VEX encoding for these instructions results in a code size reduction of ~2 bytes even though it is compiled with the AVX-512 features enabled. Reviewers: Craig Topper, Zvi Rackoover, Elena Demikhovsky Differential Revision: https://reviews.llvm.org/D27901 llvm-svn: 290663
* [AArch64][AsmParser] Add support for parsing shift/extend operands with symbols.Chad Rosier2016-12-271-3/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D27953 llvm-svn: 290609
* [AMDGPU][llvm-mc] Predefined symbols to access register counts ↵Artem Tamazov2016-12-271-7/+56
| | | | | | | | | | | | | | | | | | | | | | | (.kernel.{v|s}gpr_count) The feature allows for conditional assembly, filling the entries of .amd_kernel_code_t etc. Symbols are defined with value 0 at the beginning of each kernel scope. After each register usage, the respective symbol is set to: value = max( value, ( register index + 1 ) ) Thus, at the end of scope the value represents a count of used registers. Kernel scopes begin at .amdgpu_hsa_kernel directive, end at the next .amdgpu_hsa_kernel (or EOF, whichever comes first). There is also dummy scope that lies from the beginning of source file til the first .amdgpu_hsa_kernel. Test added. Differential Revision: https://reviews.llvm.org/D27859 llvm-svn: 290608
* [AMDGPU] Assembler: support SDWA and DPP for VOP2b instructionsSam Kolton2016-12-273-6/+37
| | | | | | | | | | Reviewers: nhaustov, artem.tamazov, vpykhtin, tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D28051 llvm-svn: 290599
* [AVX-512] Add all forms of VPALIGNR, VALIGND, and VALIGNQ to the load ↵Craig Topper2016-12-271-2/+27
| | | | | | folding tables. llvm-svn: 290591
* [AVX-512] Remove masked pmuldq and pmuludq intrinsics and autoupgrade them ↵Craig Topper2016-12-271-12/+0
| | | | | | to unmasked intrinsics plus a select. llvm-svn: 290583
* [AVX-512] Add 512-bit unmasked intrinsics for pmuldq and pmuludq so we can ↵Craig Topper2016-12-271-0/+2
| | | | | | | | add them to InstCombine with the 128 and 256 bit versions. The 128 and 256 bit masked intrinsics are currently unused by clang. The sse and avx2 unmasked intrinsics are used instead. The new 512-bit intrinsic will be used to do the same. Then all masked versions will removed and autoupgraded. llvm-svn: 290573
* [AVX-512] Add isel patterns to turn native masked scalar add/sub/mul/div ↵Craig Topper2016-12-271-0/+22
| | | | | | into masked instructions. llvm-svn: 290564
* [AVX-512] Fix some patterns to use extended register classes.Craig Topper2016-12-261-64/+73
| | | | llvm-svn: 290536
* [AVX-512] Don't assume that the rounding mode argument to intrinsics is a ↵Craig Topper2016-12-261-16/+17
| | | | | | | | constant. While clang will guarantee this, nothing in the backend will. A non-constant value will now result in an isel error instead of just asserting or crashing due to a bad cast during lowering. llvm-svn: 290532
* revert commit 290516Michael Zuckerman2016-12-251-1/+0
| | | | llvm-svn: 290517
* Commit try added new empty lineMichael Zuckerman2016-12-251-0/+1
| | | | llvm-svn: 290516
* AMDGPU: split ret/noret patterns for global atomicsJan Vesely2016-12-233-22/+52
| | | | | | Differential Revision: https://reviews.llvm.org/D27989 llvm-svn: 290435
* [AArch64] Cortex-A57 FDIV/FSQRT scheduling fix (W-unit)Renato Golin2016-12-232-18/+18
| | | | | | | | | | | | | | According to the Cortex-A57 doc, FDIV/FSQRT instructions should use F0 unit (W-unit in AArch64SchedA57.td, the same as cryptography instructions), not F1 unit (X-unit in td, like ASIMD absolute diff accum SABA/UABA). This patch changes FDIV/FSQRT scheduling declarations to use A57UnitW instead of A57UnitX. Also, latencies for those instructions are corrected. Patch by Andrew Zhogin. llvm-svn: 290426
* Revert r290423 because it broke the sanitizer-x86_64-linux-autoconf buildbot.Florian Hahn2016-12-231-5/+0
| | | | llvm-svn: 290425
* [framelowering] Skip dbg values when getting next/previous instruction.Florian Hahn2016-12-231-0/+5
| | | | | | | | | | | | | | | | | | | Summary: In mergeSPUpdates, debug values need to be ignored when getting the previous element, otherwise debug data could have an impact on codegen. In eliminateCallFramePseudoInstr, debug values after the erased element could have an impact on codegen and should be skipped. Closes PR31319 (https://llvm.org/bugs/show_bug.cgi?id=31319) Reviewers: mkuper, MatzeB, aprantl Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D27688 llvm-svn: 290423
* [WebAssembly] Annotate call and load/store immediates.Dan Gohman2016-12-234-26/+36
| | | | | | These will be used to guide the binary encoding of these immediates. llvm-svn: 290412
* Enable '-Wstring-conversion' and fix some bad asserts that it helpedChandler Carruth2016-12-231-1/+1
| | | | | | | | find. Notable is the assert in NewGVN which had no effect because of the bug. llvm-svn: 290400
OpenPOWER on IntegriCloud