summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Merged Reverse/Alternate shuffle cost tables. NFCI.Simon Pilgrim2017-01-041-141/+81
| | | | | | As discussed on D27811, merged the shuffle cost LUTs and use the shuffle kind to perform the lookup instead of the ISD opcode. llvm-svn: 290956
* [framelowering] Skip dbg values when getting next/previous instruction.Florian Hahn2017-01-041-8/+14
| | | | | | | | | | | | | | | | | | | Summary: In mergeSPUpdates, debug values need to be ignored when getting the previous element, otherwise debug data could have an impact on codegen. In eliminateCallFramePseudoInstr, debug values after the erased element could have an impact on codegen and should be skipped. Closes PR31319 (https://llvm.org/bugs/show_bug.cgi?id=31319) Reviewers: aprantl, MatzeB, mkuper Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D27688 llvm-svn: 290955
* [LLC][MIPS] Fix crash after enabling LLVM_ENABLE_EXPENSIVE_CHECKSNitesh Jain2017-01-042-0/+8
| | | | | | | | | Reviewers: sdardis, vkalintiris Subscribers: jaydeep, slthakur, RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D27841 llvm-svn: 290949
* [X86][AVX512] Passing the appropriate memory operand class to ↵Ayman Musa2017-01-042-26/+43
| | | | | | | | | | INT_{U}COMIS{S|D} instructions Replacing the memory operand in the intrinsic versions of the comis/ucomis instrucions from f128mem to ssmem/sdmem accordingly. Differential Revision: https://reviews.llvm.org/D28138 llvm-svn: 290948
* [X86] Attempt to pre-truncate arithmetic operations if usefulSimon Pilgrim2017-01-041-0/+81
| | | | | | | | | | | | | | In some cases its more efficient to combine TRUNC( BINOP( X, Y ) ) --> BINOP( TRUNC( X ), TRUNC( Y ) ) if the binop is legal for the truncated types. This is true for vector integer multiplication (especially vXi64), as well as ADD/AND/XOR/OR in cases where we only need to truncate one of the inputs at runtime (e.g. a duplicated input or an one use constant we can fold). Further work could be done here - scalar cases (especially i64) could often benefit (if we avoid partial registers etc.), other opcodes, and better analysis of when truncating the inputs reduces costs. I have considered implementing this for all targets within the DAGCombiner but wasn't sure we could devise a suitable cost model system that would give us the range we need. Differential Revision: https://reviews.llvm.org/D28219 llvm-svn: 290947
* [AVX-512] Add support for detecting 512-bit shuffles that contain a 128-bit ↵Craig Topper2017-01-041-3/+33
| | | | | | | | subvector insertion from the lowest subvector of one of the sources. These are best handled with a vinsert32x4 or vinsert64x2 instruction. llvm-svn: 290946
* [AVX-512] Simplify code for creating 512-bit SHUF128 operations.Craig Topper2017-01-041-18/+11
| | | | | | We don't need two loops and we can safely assume assume and hardcode the size of the widened mask. llvm-svn: 290942
* [Hexagon, TableGen] Fix some Clang-tidy modernize and Include What You Use ↵Eugene Zelenko2017-01-0412-385/+301
| | | | | | warnings; other minor fixes (NFC). llvm-svn: 290925
* [X86] Move 128-bit shuffle mask widening check into lowerV2X128VectorShuffle ↵Craig Topper2017-01-031-22/+17
| | | | | | to reduce code duplication. Use the now available widened mask to simplify some code inside lowerV2X128VectorShuffle. llvm-svn: 290872
* [AVX-512] Simplify the code added in r290870 to recognized 256-bit subvector ↵Craig Topper2017-01-031-30/+7
| | | | | | inserts and avoid calling isShuffleEquivalent on a widened mask. llvm-svn: 290871
* [AVX-512] Teach shuffle lowering to use vinsert instructions for shuffles ↵Craig Topper2017-01-031-0/+39
| | | | | | corresponding to 256-bit subvector inserts. llvm-svn: 290870
* [AVX-512] Teach EVEX to VEX conversion pass to handle VINSERT and VEXTRACT ↵Craig Topper2017-01-031-0/+16
| | | | | | instructions. llvm-svn: 290869
* [X86] Remove trailing whitespace and an unnecessary line wrap. NFCCraig Topper2017-01-031-37/+35
| | | | llvm-svn: 290867
* [X86] Fix header comment. NFCCraig Topper2017-01-031-1/+1
| | | | llvm-svn: 290866
* [AVX-512] Add support for pushing bitcasts through INSERT_SUBVEC in order to ↵Craig Topper2017-01-031-0/+23
| | | | | | select a masked operation. llvm-svn: 290865
* [AVX-512] Remove vinsert intrinsics and autoupgrade to native ↵Craig Topper2017-01-032-34/+2
| | | | | | shufflevectors. There are some codegen problems here that I'll try to fix in future commits. llvm-svn: 290864
* [AVX-512] Remove vextract intrinsics and autoupgrade to native ↵Craig Topper2017-01-031-27/+0
| | | | | | | | shufflevectors. This unfortunately generates some really terrible code without VLX support due to v2i1 and v4i1 not being legal. Hopefully we can improve that in future patches. llvm-svn: 290863
* [XRay] Merge instrumentation point table emission code into AsmPrinter.Dean Michael Berris2017-01-036-150/+2
| | | | | | | | | | | | | | | | | | Summary: No need to have this per-architecture. While there, unify 32-bit ARM's behaviour with what changed elsewhere and start function names lowercase as per the coding standards. Individual entry emission code goes to the entry's own class. Fully tested on amd64, cross-builds on both ARMs and PowerPC. Reviewers: dberris Subscribers: aemerson, llvm-commits Differential Revision: https://reviews.llvm.org/D28209 llvm-svn: 290858
* Fixed shuffle-reverse cost on AVX-512.Elena Demikhovsky2017-01-021-0/+1
| | | | | | (This changed was approved in https://reviews.llvm.org/D28118, but Simon asked to submit it separately). llvm-svn: 290812
* AVX-512 Loop Vectorizer: Cost calculation for interleave load/store patterns.Elena Demikhovsky2017-01-022-9/+250
| | | | | | | | | | | | X86 target does not provide any target specific cost calculation for interleave patterns.It uses the common target-independent calculation, which gives very high numbers. As a result, the scalar version is chosen in many cases. The situation on AVX-512 is even worse, since we have 3-src shuffles that significantly reduce the cost. In this patch I calculate the cost on AVX-512. It will allow to compare interleave pattern with gather/scatter and choose a better solution (PR31426). * Shiffle-broadcast cost will be changed in Simon's upcoming patch. Differential Revision: https://reviews.llvm.org/D28118 llvm-svn: 290810
* [AVR] Optimize 16-bit ANDs with '1'Dylan McKay2016-12-311-0/+4
| | | | | | | | | | | | Summary: Fixes PR 31345 Reviewers: dylanmckay Subscribers: fhahn, llvm-commits Differential Revision: https://reviews.llvm.org/D28186 llvm-svn: 290778
* Caught a simple typo. I do not know of a way to test this, but it seems like ↵Aaron Ballman2016-12-301-1/+1
| | | | | | an unlikely thing to regress in the future. llvm-svn: 290757
* [AVR] Optimize 16-bit ORs with '0'Dylan McKay2016-12-301-12/+27
| | | | | | | | | | | | | | Summary: Fixes PR 31344 Authored by Anmol P. Paralkar Reviewers: dylanmckay Subscribers: fhahn, llvm-commits Differential Revision: https://reviews.llvm.org/D28121 llvm-svn: 290732
* Revert "[COFF] Use 32-bit jump table entries in .rdata for Win64"Reid Kleckner2016-12-292-26/+0
| | | | | | | | This reverts commit r290694. It broke sanitizer tests on Win64. I'll probably bring this back, but the jump tables will just live in .text like they do for MSVC. llvm-svn: 290714
* [AMDGPU][mc] Enable absolute expressions in .hsa_code_object_isa directiveArtem Tamazov2016-12-291-12/+17
| | | | | | | | | | | Among other stuff, this allows to use predefined .option.machine_version_major /minor/stepping symbols in the directive. Relevant test expanded at once (also file renamed for clarity). Differential Revision: https://reviews.llvm.org/D28140 llvm-svn: 290710
* [COFF] Use 32-bit jump table entries in .rdata for Win64Reid Kleckner2016-12-292-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: We were already using 32-bit jump table entries, but this was a consequence of the default PIC model on Win64, and not an intentional design decision. This patch ensures that we always use 32-bit label difference jump table entries on Win64 regardless of the PIC model. This is a good idea because it saves executable size and object file size. Moving the jump tables to .rdata cleans up the disassembled object code and reduces the available ROP targets, but it requires adding one more RIP-relative lea to the code. COFF doesn't have relocations to express the difference between two arbitrary symbols, so we can't use the jump table label in the label difference like we do elsewhere. Fixes PR31488 Reviewers: majnemer, compnerd Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28141 llvm-svn: 290694
* This is a large patch for X86 AVX-512 of an optimization for reducing code ↵Gadi Haber2016-12-287-0/+1384
| | | | | | | | | | | | size by encoding EVEX AVX-512 instructions using the shorter VEX encoding when possible. There are cases of AVX-512 instructions that have two possible encodings. This is the case with instructions that use vector registers with low indexes of 0 - 15 and do not use the zmm registers or the mask k registers. The EVEX encoding prefix requires 4 bytes whereas the VEX prefix can take only up to 3 bytes. Consequently, using the VEX encoding for these instructions results in a code size reduction of ~2 bytes even though it is compiled with the AVX-512 features enabled. Reviewers: Craig Topper, Zvi Rackoover, Elena Demikhovsky Differential Revision: https://reviews.llvm.org/D27901 llvm-svn: 290663
* [AArch64][AsmParser] Add support for parsing shift/extend operands with symbols.Chad Rosier2016-12-271-3/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D27953 llvm-svn: 290609
* [AMDGPU][llvm-mc] Predefined symbols to access register counts ↵Artem Tamazov2016-12-271-7/+56
| | | | | | | | | | | | | | | | | | | | | | | (.kernel.{v|s}gpr_count) The feature allows for conditional assembly, filling the entries of .amd_kernel_code_t etc. Symbols are defined with value 0 at the beginning of each kernel scope. After each register usage, the respective symbol is set to: value = max( value, ( register index + 1 ) ) Thus, at the end of scope the value represents a count of used registers. Kernel scopes begin at .amdgpu_hsa_kernel directive, end at the next .amdgpu_hsa_kernel (or EOF, whichever comes first). There is also dummy scope that lies from the beginning of source file til the first .amdgpu_hsa_kernel. Test added. Differential Revision: https://reviews.llvm.org/D27859 llvm-svn: 290608
* [AMDGPU] Assembler: support SDWA and DPP for VOP2b instructionsSam Kolton2016-12-273-6/+37
| | | | | | | | | | Reviewers: nhaustov, artem.tamazov, vpykhtin, tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D28051 llvm-svn: 290599
* [AVX-512] Add all forms of VPALIGNR, VALIGND, and VALIGNQ to the load ↵Craig Topper2016-12-271-2/+27
| | | | | | folding tables. llvm-svn: 290591
* [AVX-512] Remove masked pmuldq and pmuludq intrinsics and autoupgrade them ↵Craig Topper2016-12-271-12/+0
| | | | | | to unmasked intrinsics plus a select. llvm-svn: 290583
* [AVX-512] Add 512-bit unmasked intrinsics for pmuldq and pmuludq so we can ↵Craig Topper2016-12-271-0/+2
| | | | | | | | add them to InstCombine with the 128 and 256 bit versions. The 128 and 256 bit masked intrinsics are currently unused by clang. The sse and avx2 unmasked intrinsics are used instead. The new 512-bit intrinsic will be used to do the same. Then all masked versions will removed and autoupgraded. llvm-svn: 290573
* [AVX-512] Add isel patterns to turn native masked scalar add/sub/mul/div ↵Craig Topper2016-12-271-0/+22
| | | | | | into masked instructions. llvm-svn: 290564
* [AVX-512] Fix some patterns to use extended register classes.Craig Topper2016-12-261-64/+73
| | | | llvm-svn: 290536
* [AVX-512] Don't assume that the rounding mode argument to intrinsics is a ↵Craig Topper2016-12-261-16/+17
| | | | | | | | constant. While clang will guarantee this, nothing in the backend will. A non-constant value will now result in an isel error instead of just asserting or crashing due to a bad cast during lowering. llvm-svn: 290532
* revert commit 290516Michael Zuckerman2016-12-251-1/+0
| | | | llvm-svn: 290517
* Commit try added new empty lineMichael Zuckerman2016-12-251-0/+1
| | | | llvm-svn: 290516
* AMDGPU: split ret/noret patterns for global atomicsJan Vesely2016-12-233-22/+52
| | | | | | Differential Revision: https://reviews.llvm.org/D27989 llvm-svn: 290435
* [AArch64] Cortex-A57 FDIV/FSQRT scheduling fix (W-unit)Renato Golin2016-12-232-18/+18
| | | | | | | | | | | | | | According to the Cortex-A57 doc, FDIV/FSQRT instructions should use F0 unit (W-unit in AArch64SchedA57.td, the same as cryptography instructions), not F1 unit (X-unit in td, like ASIMD absolute diff accum SABA/UABA). This patch changes FDIV/FSQRT scheduling declarations to use A57UnitW instead of A57UnitX. Also, latencies for those instructions are corrected. Patch by Andrew Zhogin. llvm-svn: 290426
* Revert r290423 because it broke the sanitizer-x86_64-linux-autoconf buildbot.Florian Hahn2016-12-231-5/+0
| | | | llvm-svn: 290425
* [framelowering] Skip dbg values when getting next/previous instruction.Florian Hahn2016-12-231-0/+5
| | | | | | | | | | | | | | | | | | | Summary: In mergeSPUpdates, debug values need to be ignored when getting the previous element, otherwise debug data could have an impact on codegen. In eliminateCallFramePseudoInstr, debug values after the erased element could have an impact on codegen and should be skipped. Closes PR31319 (https://llvm.org/bugs/show_bug.cgi?id=31319) Reviewers: mkuper, MatzeB, aprantl Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D27688 llvm-svn: 290423
* [WebAssembly] Annotate call and load/store immediates.Dan Gohman2016-12-234-26/+36
| | | | | | These will be used to guide the binary encoding of these immediates. llvm-svn: 290412
* Enable '-Wstring-conversion' and fix some bad asserts that it helpedChandler Carruth2016-12-231-1/+1
| | | | | | | | find. Notable is the assert in NewGVN which had no effect because of the bug. llvm-svn: 290400
* [AArch64][CallLowering] Constraint registers on target specific instructionQuentin Colombet2016-12-221-1/+12
| | | | | | | | | | | The InstructionSelect pass will not look at target specific instructions since they are already selected. As a result, the operands of target specific instructions must be properly constrained, because it is not going to fix them. This fixes invalid register classes on call instruction. llvm-svn: 290377
* AMDGPU: Invert cmp + select with constantMatt Arsenault2016-12-221-0/+19
| | | | | | | | | | | Canonicalize a select with a constant to the false side. This enables more instruction shrinking opportunities since an inline immediate can be used for the false side of v_cndmask_b32_e32. This seems to usually be better but causes some code size regressions in some tests. llvm-svn: 290372
* [Hexagon] Add DAG mutations for machine pipelinerKrzysztof Parzyszek2016-12-222-0/+9
| | | | llvm-svn: 290366
* Change the interface of TLI.isMultiStoresCheaperThanBitsMerge.Wei Mi2016-12-221-8/+4
| | | | | | | | | This is for splitMergedValStore in DAG Combine to share the target query interface with similar logic in CodeGenPrepare. Differential Revision: https://reviews.llvm.org/D24707 llvm-svn: 290363
* [mips] Fix compact branch hazard detection, part 2Petar Jovanovic2016-12-221-22/+15
| | | | | | | | | | | Follow up to D27209 fix, this patch now properly handles single transient instruction in basic block. Patch by Aleksandar Beserminji. Differential Revision: https://reviews.llvm.org/D27856 llvm-svn: 290361
* AMDGPU: Use i16 for i16 shift amountMatt Arsenault2016-12-222-8/+10
| | | | llvm-svn: 290351
OpenPOWER on IntegriCloud