summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* AVX-512 Loop Vectorizer: Cost calculation for interleave load/store patterns.Elena Demikhovsky2017-01-022-9/+250
| | | | | | | | | | | | X86 target does not provide any target specific cost calculation for interleave patterns.It uses the common target-independent calculation, which gives very high numbers. As a result, the scalar version is chosen in many cases. The situation on AVX-512 is even worse, since we have 3-src shuffles that significantly reduce the cost. In this patch I calculate the cost on AVX-512. It will allow to compare interleave pattern with gather/scatter and choose a better solution (PR31426). * Shiffle-broadcast cost will be changed in Simon's upcoming patch. Differential Revision: https://reviews.llvm.org/D28118 llvm-svn: 290810
* [AVR] Optimize 16-bit ANDs with '1'Dylan McKay2016-12-311-0/+4
| | | | | | | | | | | | Summary: Fixes PR 31345 Reviewers: dylanmckay Subscribers: fhahn, llvm-commits Differential Revision: https://reviews.llvm.org/D28186 llvm-svn: 290778
* Caught a simple typo. I do not know of a way to test this, but it seems like ↵Aaron Ballman2016-12-301-1/+1
| | | | | | an unlikely thing to regress in the future. llvm-svn: 290757
* [AVR] Optimize 16-bit ORs with '0'Dylan McKay2016-12-301-12/+27
| | | | | | | | | | | | | | Summary: Fixes PR 31344 Authored by Anmol P. Paralkar Reviewers: dylanmckay Subscribers: fhahn, llvm-commits Differential Revision: https://reviews.llvm.org/D28121 llvm-svn: 290732
* Revert "[COFF] Use 32-bit jump table entries in .rdata for Win64"Reid Kleckner2016-12-292-26/+0
| | | | | | | | This reverts commit r290694. It broke sanitizer tests on Win64. I'll probably bring this back, but the jump tables will just live in .text like they do for MSVC. llvm-svn: 290714
* [AMDGPU][mc] Enable absolute expressions in .hsa_code_object_isa directiveArtem Tamazov2016-12-291-12/+17
| | | | | | | | | | | Among other stuff, this allows to use predefined .option.machine_version_major /minor/stepping symbols in the directive. Relevant test expanded at once (also file renamed for clarity). Differential Revision: https://reviews.llvm.org/D28140 llvm-svn: 290710
* [COFF] Use 32-bit jump table entries in .rdata for Win64Reid Kleckner2016-12-292-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: We were already using 32-bit jump table entries, but this was a consequence of the default PIC model on Win64, and not an intentional design decision. This patch ensures that we always use 32-bit label difference jump table entries on Win64 regardless of the PIC model. This is a good idea because it saves executable size and object file size. Moving the jump tables to .rdata cleans up the disassembled object code and reduces the available ROP targets, but it requires adding one more RIP-relative lea to the code. COFF doesn't have relocations to express the difference between two arbitrary symbols, so we can't use the jump table label in the label difference like we do elsewhere. Fixes PR31488 Reviewers: majnemer, compnerd Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28141 llvm-svn: 290694
* This is a large patch for X86 AVX-512 of an optimization for reducing code ↵Gadi Haber2016-12-287-0/+1384
| | | | | | | | | | | | size by encoding EVEX AVX-512 instructions using the shorter VEX encoding when possible. There are cases of AVX-512 instructions that have two possible encodings. This is the case with instructions that use vector registers with low indexes of 0 - 15 and do not use the zmm registers or the mask k registers. The EVEX encoding prefix requires 4 bytes whereas the VEX prefix can take only up to 3 bytes. Consequently, using the VEX encoding for these instructions results in a code size reduction of ~2 bytes even though it is compiled with the AVX-512 features enabled. Reviewers: Craig Topper, Zvi Rackoover, Elena Demikhovsky Differential Revision: https://reviews.llvm.org/D27901 llvm-svn: 290663
* [AArch64][AsmParser] Add support for parsing shift/extend operands with symbols.Chad Rosier2016-12-271-3/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D27953 llvm-svn: 290609
* [AMDGPU][llvm-mc] Predefined symbols to access register counts ↵Artem Tamazov2016-12-271-7/+56
| | | | | | | | | | | | | | | | | | | | | | | (.kernel.{v|s}gpr_count) The feature allows for conditional assembly, filling the entries of .amd_kernel_code_t etc. Symbols are defined with value 0 at the beginning of each kernel scope. After each register usage, the respective symbol is set to: value = max( value, ( register index + 1 ) ) Thus, at the end of scope the value represents a count of used registers. Kernel scopes begin at .amdgpu_hsa_kernel directive, end at the next .amdgpu_hsa_kernel (or EOF, whichever comes first). There is also dummy scope that lies from the beginning of source file til the first .amdgpu_hsa_kernel. Test added. Differential Revision: https://reviews.llvm.org/D27859 llvm-svn: 290608
* [AMDGPU] Assembler: support SDWA and DPP for VOP2b instructionsSam Kolton2016-12-273-6/+37
| | | | | | | | | | Reviewers: nhaustov, artem.tamazov, vpykhtin, tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D28051 llvm-svn: 290599
* [AVX-512] Add all forms of VPALIGNR, VALIGND, and VALIGNQ to the load ↵Craig Topper2016-12-271-2/+27
| | | | | | folding tables. llvm-svn: 290591
* [AVX-512] Remove masked pmuldq and pmuludq intrinsics and autoupgrade them ↵Craig Topper2016-12-271-12/+0
| | | | | | to unmasked intrinsics plus a select. llvm-svn: 290583
* [AVX-512] Add 512-bit unmasked intrinsics for pmuldq and pmuludq so we can ↵Craig Topper2016-12-271-0/+2
| | | | | | | | add them to InstCombine with the 128 and 256 bit versions. The 128 and 256 bit masked intrinsics are currently unused by clang. The sse and avx2 unmasked intrinsics are used instead. The new 512-bit intrinsic will be used to do the same. Then all masked versions will removed and autoupgraded. llvm-svn: 290573
* [AVX-512] Add isel patterns to turn native masked scalar add/sub/mul/div ↵Craig Topper2016-12-271-0/+22
| | | | | | into masked instructions. llvm-svn: 290564
* [AVX-512] Fix some patterns to use extended register classes.Craig Topper2016-12-261-64/+73
| | | | llvm-svn: 290536
* [AVX-512] Don't assume that the rounding mode argument to intrinsics is a ↵Craig Topper2016-12-261-16/+17
| | | | | | | | constant. While clang will guarantee this, nothing in the backend will. A non-constant value will now result in an isel error instead of just asserting or crashing due to a bad cast during lowering. llvm-svn: 290532
* revert commit 290516Michael Zuckerman2016-12-251-1/+0
| | | | llvm-svn: 290517
* Commit try added new empty lineMichael Zuckerman2016-12-251-0/+1
| | | | llvm-svn: 290516
* AMDGPU: split ret/noret patterns for global atomicsJan Vesely2016-12-233-22/+52
| | | | | | Differential Revision: https://reviews.llvm.org/D27989 llvm-svn: 290435
* [AArch64] Cortex-A57 FDIV/FSQRT scheduling fix (W-unit)Renato Golin2016-12-232-18/+18
| | | | | | | | | | | | | | According to the Cortex-A57 doc, FDIV/FSQRT instructions should use F0 unit (W-unit in AArch64SchedA57.td, the same as cryptography instructions), not F1 unit (X-unit in td, like ASIMD absolute diff accum SABA/UABA). This patch changes FDIV/FSQRT scheduling declarations to use A57UnitW instead of A57UnitX. Also, latencies for those instructions are corrected. Patch by Andrew Zhogin. llvm-svn: 290426
* Revert r290423 because it broke the sanitizer-x86_64-linux-autoconf buildbot.Florian Hahn2016-12-231-5/+0
| | | | llvm-svn: 290425
* [framelowering] Skip dbg values when getting next/previous instruction.Florian Hahn2016-12-231-0/+5
| | | | | | | | | | | | | | | | | | | Summary: In mergeSPUpdates, debug values need to be ignored when getting the previous element, otherwise debug data could have an impact on codegen. In eliminateCallFramePseudoInstr, debug values after the erased element could have an impact on codegen and should be skipped. Closes PR31319 (https://llvm.org/bugs/show_bug.cgi?id=31319) Reviewers: mkuper, MatzeB, aprantl Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D27688 llvm-svn: 290423
* [WebAssembly] Annotate call and load/store immediates.Dan Gohman2016-12-234-26/+36
| | | | | | These will be used to guide the binary encoding of these immediates. llvm-svn: 290412
* Enable '-Wstring-conversion' and fix some bad asserts that it helpedChandler Carruth2016-12-231-1/+1
| | | | | | | | find. Notable is the assert in NewGVN which had no effect because of the bug. llvm-svn: 290400
* [AArch64][CallLowering] Constraint registers on target specific instructionQuentin Colombet2016-12-221-1/+12
| | | | | | | | | | | The InstructionSelect pass will not look at target specific instructions since they are already selected. As a result, the operands of target specific instructions must be properly constrained, because it is not going to fix them. This fixes invalid register classes on call instruction. llvm-svn: 290377
* AMDGPU: Invert cmp + select with constantMatt Arsenault2016-12-221-0/+19
| | | | | | | | | | | Canonicalize a select with a constant to the false side. This enables more instruction shrinking opportunities since an inline immediate can be used for the false side of v_cndmask_b32_e32. This seems to usually be better but causes some code size regressions in some tests. llvm-svn: 290372
* [Hexagon] Add DAG mutations for machine pipelinerKrzysztof Parzyszek2016-12-222-0/+9
| | | | llvm-svn: 290366
* Change the interface of TLI.isMultiStoresCheaperThanBitsMerge.Wei Mi2016-12-221-8/+4
| | | | | | | | | This is for splitMergedValStore in DAG Combine to share the target query interface with similar logic in CodeGenPrepare. Differential Revision: https://reviews.llvm.org/D24707 llvm-svn: 290363
* [mips] Fix compact branch hazard detection, part 2Petar Jovanovic2016-12-221-22/+15
| | | | | | | | | | | Follow up to D27209 fix, this patch now properly handles single transient instruction in basic block. Patch by Aleksandar Beserminji. Differential Revision: https://reviews.llvm.org/D27856 llvm-svn: 290361
* AMDGPU: Use i16 for i16 shift amountMatt Arsenault2016-12-222-8/+10
| | | | llvm-svn: 290351
* AMDGPU: Fix missing 16-bit cmpx instructionsMatt Arsenault2016-12-221-0/+39
| | | | llvm-svn: 290349
* AMDGPU: Use i16 comparison instructionsMatt Arsenault2016-12-222-5/+43
| | | | llvm-svn: 290348
* AMDGPU: Fixed '!NodePtr->isKnownSentinel()' assertMatt Arsenault2016-12-221-17/+4
| | | | | | | | Caused by dereferencing end iterator when trying to const cast the iterator. Patch by Martin Sherburn llvm-svn: 290347
* [WebAssembly] Add an "explicit" keyword to a constructor.Dan Gohman2016-12-221-1/+1
| | | | llvm-svn: 290345
* [WebAssembly] Don't use variadic operand indices in the MCOperandInfo array.Dan Gohman2016-12-221-4/+5
| | | | llvm-svn: 290344
* [WebAssembly] Don't old negative load/store offsets in fast-isel.Dan Gohman2016-12-221-9/+24
| | | | | | | WebAssembly's load/store offsets are unsigned and don't wrap, so it's not valid to fold in a negative offset. llvm-svn: 290342
* [AMDGPU] Add pseudo SDWA instructionsSam Kolton2016-12-225-85/+159
| | | | | | | | | | | | Summary: This is needed for later SDWA support in CodeGen. Reviewers: vpykhtin, tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27412 llvm-svn: 290338
* [AMDGPU] Disassembler: fix for disaasembling v_mac_f32/16_dpp/sdwaSam Kolton2016-12-224-5/+26
| | | | | | | | | | | | Summary: Real instruction should copy constraints from real instruction. This allows auto-generated disassembler to correctly process tied operands. Reviewers: nhaustov, vpykhtin, tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27847 llvm-svn: 290336
* [X86][AVX2] Passing the appropriate memory operand class to VPMADDWD ↵Ayman Musa2016-12-221-1/+1
| | | | | | | | | | instruction. Replacing the memory operand in the ymm version of VPMADDWD from i128mem to i256mem. Differential Revision: https://reviews.llvm.org/D28024 llvm-svn: 290333
* AMDGPU: Fix missing commute table entries for cmpxMatt Arsenault2016-12-221-4/+4
| | | | | | No tests because these aren't currently used anywhere. llvm-svn: 290316
* AMDGPU: Swap order of operands in fadd/fsub combineMatt Arsenault2016-12-221-4/+4
| | | | | | | FMA is canonicalized to constant in the middle operand. Do the same so fmad matches and avoid an extra combine step. llvm-svn: 290313
* AMDGPU: Check fast math flags in fadd/fsub combinesMatt Arsenault2016-12-222-7/+15
| | | | llvm-svn: 290312
* AMDGPU: Form more FMAs if fusion is allowedMatt Arsenault2016-12-222-30/+46
| | | | | | | Extend the existing fadd/fsub->fmad combines to produce FMA if allowed. llvm-svn: 290311
* AMDGPU: Move combines into separate functionsMatt Arsenault2016-12-222-152/+174
| | | | llvm-svn: 290309
* AMDGPU: Enable some f32 fadd/fsub combines for f16Matt Arsenault2016-12-221-7/+12
| | | | llvm-svn: 290308
* AMDGPU: Implement isFMAFasterThanFMulAndFAdd for f16Matt Arsenault2016-12-221-0/+2
| | | | llvm-svn: 290307
* AMDGPU: Allow rcp and rsq usage with f16Matt Arsenault2016-12-222-4/+8
| | | | llvm-svn: 290302
* AMDGPU: Custom lower f16 fdivMatt Arsenault2016-12-222-1/+22
| | | | llvm-svn: 290301
* AMDGPU: Implement f16 fcanonicalizeMatt Arsenault2016-12-223-0/+9
| | | | llvm-svn: 290300
OpenPOWER on IntegriCloud