summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [TargetLowering] Rename preferShiftsToClearExtremeBits and ↵Simon Pilgrim2019-04-166-11/+10
| | | | | | | | | | | | shouldFoldShiftPairToMask (PR41359) As discussed on PR41359, this patch renames the pair of shift-mask target feature functions to make their purposes more obvious. shouldFoldShiftPairToMask -> shouldFoldConstantShiftPairToMask preferShiftsToClearExtremeBits -> shouldFoldMaskToVariableShiftPair llvm-svn: 358526
* [X86][AVX] X86ISD::PERMV/PERMV3 node types can never fold index opsSimon Pilgrim2019-04-161-5/+13
| | | | | | | | | | Improves codegen demonstrated by D60512 - instructions represented by X86ISD::PERMV/PERMV3 can never memory fold the operand used for their index register. This patch updates the 'isUseOfShuffle' helper into the more capable 'isFoldableUseOfShuffle' that recognises that the op is used for a X86ISD::PERMV/PERMV3 index mask and can't be folded - allowing us to use broadcast/subvector-broadcast ops to reduce the size of the mask constant pool data. Differential Revision: https://reviews.llvm.org/D60562 llvm-svn: 358516
* [Hexagon] Remove indeterministic traversal orderKrzysztof Parzyszek2019-04-161-9/+8
| | | | | | Patch by Sergei Larin. llvm-svn: 358505
* [RISCV] Custom lower SHL_PARTS, SRA_PARTS, SRL_PARTSLuis Marques2019-04-162-3/+108
| | | | | | | | | When not optimizing for minimum size (-Oz) we custom lower wide shifts (SHL_PARTS, SRA_PARTS, SRL_PARTS) instead of expanding to a libcall. Differential Revision: https://reviews.llvm.org/D59477 llvm-svn: 358498
* [X86] Limit the 'x' inline assembly constraint to zmm0-15 when used for a ↵Craig Topper2019-04-153-1/+8
| | | | | | | | 512 type. The 'v' constraint is used to select zmm0-31. This makes 512 bit consistent with 128/256-bit.a llvm-svn: 358450
* AMDGPU: Fix unreachable when counting register usage of SGPR96Matt Arsenault2019-04-151-0/+3
| | | | llvm-svn: 358447
* AMDGPU: Fix printed format of SReg_96Matt Arsenault2019-04-151-0/+3
| | | | | | | These are artificial, so I think this should only come up with inline asm comments. llvm-svn: 358446
* [X86] Block i32/i64 for 'k' and 'Yk' in getRegForInlineAsmConstraint without ↵Craig Topper2019-04-151-24/+21
| | | | | | | | avx512bw. 32 and 64 bit k-registers require avx512bw. If we don't block this properly, it leads to a crash. llvm-svn: 358436
* Add explicit dependency to MCDwarf.h in ARC backend.Pete Couperus2019-04-151-0/+1
| | | | llvm-svn: 358430
* [X86] Restore the pavg intrinsics.Craig Topper2019-04-151-0/+6
| | | | | | | | | | | | | | | The pattern we replaced these with may be too hard to match as demonstrated by PR41496 and PR41316. This patch restores the intrinsics and then we can start focusing on the optimizing the intrinsics. I've mostly reverted the original patch that removed them. Though I modified the avx512 intrinsics to not have masking built in. Differential Revision: https://reviews.llvm.org/D60674 llvm-svn: 358427
* Add slbfee instruction.Sean Fertile2019-04-154-0/+7
| | | | llvm-svn: 358425
* [AMDGPU] Fixed incorrect test in vcnd/vcmp optimizationTim Renouf2019-04-151-1/+1
| | | | | | | | | | | | This fixes a test I introduced in change D59191 (that added src0 and src1 modifiers to the v_cndmask instruction for disassembly purposes). Spotted by David Binderman in bug 41488. Differential Revision: https://reviews.llvm.org/D60652 Change-Id: I6ac95e66cd84e812ed3359ad57bcd0e13198ba0c llvm-svn: 358392
* [Sparc] Fix typo. NFC.Jim Lin2019-04-151-2/+2
| | | | llvm-svn: 358370
* [GlobalISel] Enable CSE in the IRTranslator & legalizer for -O0 with ↵Amara Emerson2019-04-152-19/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | constants only. Other opcodes shouldn't be CSE'd until we can be sure debug info quality won't be degraded. This change also improves the IRTranslator so that in most places, but not all, it creates constants using the MIRBuilder directly instead of first creating a new destination vreg and then creating a constant. By doing this, the buildConstant() method can just return the vreg of an existing G_CONSTANT instead of having to create a COPY from it. I measured a 0.2% improvement in compile time and a 0.9% improvement in code size at -O0 ARM64. Compile time: Program base cse diff test-suite...ark/tramp3d-v4/tramp3d-v4.test 9.04 9.12 0.8% test-suite...Mark/mafft/pairlocalalign.test 2.68 2.66 -0.7% test-suite...-typeset/consumer-typeset.test 5.53 5.51 -0.4% test-suite :: CTMark/lencod/lencod.test 5.30 5.28 -0.3% test-suite :: CTMark/Bullet/bullet.test 25.82 25.76 -0.2% test-suite...:: CTMark/ClamAV/clamscan.test 6.92 6.90 -0.2% test-suite...TMark/7zip/7zip-benchmark.test 34.24 34.17 -0.2% test-suite :: CTMark/SPASS/SPASS.test 6.25 6.24 -0.1% test-suite...:: CTMark/sqlite3/sqlite3.test 1.66 1.66 -0.1% test-suite :: CTMark/kimwitu++/kc.test 13.61 13.60 -0.0% Geomean difference -0.2% Code size: Program base cse diff test-suite...-typeset/consumer-typeset.test 1315632 1266480 -3.7% test-suite...:: CTMark/ClamAV/clamscan.test 1313892 1297508 -1.2% test-suite :: CTMark/lencod/lencod.test 1439504 1423112 -1.1% test-suite...TMark/7zip/7zip-benchmark.test 2936980 2904172 -1.1% test-suite :: CTMark/Bullet/bullet.test 3478276 3445460 -0.9% test-suite...ark/tramp3d-v4/tramp3d-v4.test 8082868 8033492 -0.6% test-suite :: CTMark/kimwitu++/kc.test 3870380 3853972 -0.4% test-suite :: CTMark/SPASS/SPASS.test 1434904 1434896 -0.0% test-suite...Mark/mafft/pairlocalalign.test 764528 764528 0.0% test-suite...:: CTMark/sqlite3/sqlite3.test 782092 782092 0.0% Geomean difference -0.9% Differential Revision: https://reviews.llvm.org/D60580 llvm-svn: 358369
* [GlobalISel] Introduce a CSEConfigBase class to allow targets to define ↵Amara Emerson2019-04-155-0/+31
| | | | | | | | | | | | | | their own CSE configs. Because CodeGen can't depend on GlobalISel, we need a way to encapsulate the CSE configs that can be passed between TargetPassConfig and the targets' custom pass configs. This CSEConfigBase allows targets to create custom CSE configs which is then used by the GISel passes for the CSEMIRBuilder. This support will be used in a follow up commit to allow constant-only CSE for -O0 compiles in D60580. llvm-svn: 358368
* [X86] Redefine KUNPCK instructions to take a narrower source register class ↵Craig Topper2019-04-141-11/+9
| | | | | | | | | than destination register class. Remove copies from the isel output pattern. There's no reason for the inputs to be the destination register class. This just forces an unnecessary copy in the output patterns. llvm-svn: 358362
* [X86] Put the locked mi8 instrutions above the locked mi/mi32 so they will ↵Craig Topper2019-04-141-24/+26
| | | | | | | | | be prefered. We want 64mi8 to be prefered over 64mi32. The order for 16mi/32mi doesn't really matter. llvm-svn: 358361
* [X86] Change IMUL with immediate instruction order to ri8 instructions come ↵Craig Topper2019-04-141-31/+34
| | | | | | | | | | before ri/ri32 instructions. This will ensure IMUL64ri8 is tried before IMUL64ri32. For IMUL32 and IMUL16 the order doesn't really matter because only the ri8 versions use a predicate. That automatically gives them priority. llvm-svn: 358360
* [X86] Move VPTESTM matching from the isel table to custom code in ↵Craig Topper2019-04-142-251/+396
| | | | | | | | | | | | | | | | | | | | | | | | | | | | X86ISelDAGToDAG. We had many tablegen patterns for these instructions. And due to the commutability of the patterns, tablegen expands them to even more patterns. All together VPTESTMD patterns accounted for more the 50K of the 610K isel table. This had gotten bad when we stopped canonicalizing AND to vXi64. This required a pattern for every combination of bitcast input type. This change moves the matching to custom code where it is easier to look through the bitcasts without being concerned with the specific types. The test changes are because we are now stricter with one use checks as its required to make load folding legal. We now require the AND and any BITCAST to only have a single use. This prevents forming VPTESTM and a VPAND with the same inputs. We now support broadcast loads for 128/256 patterns without VLX. We'll widen to 512-bit like and still fold the broadcast since the amount of memory read doesn't change. There are a few tests that got slightly longer because are now prefering load + VPTESTM over XOR+VPCMPEQ for (seteq (load), allzeros). Previously we were able to share the XOR with multiple VPTESTM instructions. llvm-svn: 358359
* [X86] Don't form masked vpcmp/vcmp/vptestm operations if the setcc node has ↵Craig Topper2019-04-141-148/+256
| | | | | | | | | | | | | | more than one use. We're better of emitting a single compare + kand rather than a compare for the other use and a masked compare. I'm looking into using custom instruction selection for VPTESTM to reduce the ridiculous number of permutations of patterns in the isel table. Putting a one use check on all masked compare folding makes load fold matching in the custom code easier. llvm-svn: 358358
* [X86] Remove some unused tablegen multiclasses. NFCCraig Topper2019-04-141-27/+0
| | | | llvm-svn: 358345
* [X86] Use PC-relative mode for the kernel code modelBill Wendling2019-04-131-9/+14
| | | | | | | | | | | | | | | | | | Summary: The Linux kernel uses PC-relative mode, so allow that when the code model is "kernel". Reviewers: craig.topper Reviewed By: craig.topper Subscribers: llvm-commits, kees, nickdesaulniers Tags: #llvm Differential Revision: https://reviews.llvm.org/D60643 llvm-svn: 358343
* [X86] Use int64_t and isInt<N> instead of APInt operations in ↵Craig Topper2019-04-131-8/+6
| | | | | | | | | | foldLoadStoreIntoMemOperand. NFC We know all our values are limited to 64 bits here so we don't need an APInt. This should save some generated code checking between large and small size. llvm-svn: 358338
* [WebAssembly] Use Function::hasOptSize() (NFC)Heejin Ahn2019-04-131-2/+1
| | | | | | | | | | | | | | | | Summary: Use member function. Reviewers: aheejin Subscribers: sunfish, hiraditya, sbc100, jgravelle-google, dschuff, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60651 Patch by Hideto Ueno (uenoku) llvm-svn: 358336
* [AArch64][GlobalISel] Enable copy elision in the pre-legalizer combine and ↵Amara Emerson2019-04-131-0/+2
| | | | | | | | | | | | | fix a crash. This enables the simple copy combine that already exists in the CombinerHelper. However, it exposed a bug in the GISelChangeObserver where it wouldn't clear a set of MIs to process, and so would end up causing a crash when deleted MIs were being added to the combiner worklist again. Differential Revision: https://reviews.llvm.org/D60579 llvm-svn: 358318
* [AArch64][GlobalISel] Fix a crash when selecting shufflevectors with an ↵Amara Emerson2019-04-121-7/+17
| | | | | | | | | | | undef mask element. If a shufflevector's mask vector has an element with "undef" then the generic instruction defining that element register is a G_IMPLICT_DEF instead of G_CONSTANT. This fixes the selector to handle this case, and for now assumes that undef just means zero. In future we'll optimize this case properly. llvm-svn: 358312
* [WebAssembly] Add mutable-globals to bleeding-edge CPUThomas Lively2019-04-121-1/+2
| | | | | | | | | | | | | | Summary: This brings the backend in line with Clang. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60594 llvm-svn: 358310
* [Hexagon] Fix reuse bug in Vector Loop Carried Reuse passBrendon Cahoon2019-04-121-3/+10
| | | | | | | | | | | | | | | The Hexagon Vector Loop Carried Reuse pass was allowing reuse between two shufflevectors with different masks. The reason is that the masks are not instruction objects, so the code that checks each operand just skipped over the operands. This patch fixes the bug by checking if the operands are the same when they are not instruction objects. If the objects are not the same, then the code assumes that reuse cannot occur. Differential Revision: https://reviews.llvm.org/D60019 llvm-svn: 358292
* [X86][SSE] Recognise vXi1 boolean anyof/allof reduction patternsSimon Pilgrim2019-04-121-33/+56
| | | | | | | | | | | | Currently combineHorizontalPredicateResult only handles anyof/allof reduction patterns of legal types, which can be tricky to match as type legalization of bools can introduce bitcasts/truncs/extensions. This patch extends combineHorizontalPredicateResult to recognise vXi1 bool reductions as well and uses the existing combineBitcastvxi1 helper to create the MOVMSK necessary to then compare the signmask result. This ensures the accuracy of the reduction costs added in D60403 which assume the MOVMSK generation. Differential Revision: https://reviews.llvm.org/D60610 llvm-svn: 358286
* [PowerPC] Add initialization for some ppc passesKang Zhang2019-04-1213-49/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Some llc debug options need pass-name as the parameters. But if we use the pass-name ppc-early-ret, we will get below error: llc test.ll -stop-after ppc-early-ret LLVM ERROR: "ppc-early-ret" pass is not registered. Below pass-names have the pass is not registered error: ppc-ctr-loops ppc-ctr-loops-verify ppc-loop-preinc-prep ppc-toc-reg-deps ppc-vsx-copy ppc-early-ret ppc-vsx-fma-mutate ppc-vsx-swaps ppc-reduce-cr-ops ppc-qpx-load-splat ppc-branch-coalescing ppc-branch-select Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D60248 llvm-svn: 358271
* Add explicit dependencies on MCSection.h and MCDwarf.h to the .cppEric Christopher2019-04-124-0/+4
| | | | | | files rather than rely on transitive includes from MCStreamer.h. llvm-svn: 358263
* Revert "[PowerPC] Add initialization for some ppc passes"Eric Christopher2019-04-1213-28/+49
| | | | | | | This reverts commit 6f8f98ce8de7c0e4ebd7fa2e1fd9507fe8d1c317 as it is breaking nearly every bot. llvm-svn: 358260
* [PowerPC] Add initialization for some ppc passesKang Zhang2019-04-1213-49/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Some llc debug options need pass-name as the parameters. But if we use the pass-name ppc-early-ret, we will get below error: llc test.ll -stop-after ppc-early-ret LLVM ERROR: "ppc-early-ret" pass is not registered. Below pass-names have the pass is not registered error: ppc-ctr-loops ppc-ctr-loops-verify ppc-loop-preinc-prep ppc-toc-reg-deps ppc-vsx-copy ppc-early-ret ppc-vsx-fma-mutate ppc-vsx-swaps ppc-reduce-cr-ops ppc-qpx-load-splat ppc-branch-coalescing ppc-branch-select Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D60248 llvm-svn: 358256
* Include what's used in a few cpp files - these were getting transitiveEric Christopher2019-04-121-0/+1
| | | | | | includes from MCDwarf.h. llvm-svn: 358254
* [PowerPC] More precise exploitation of P9 maddld instruction when operands ↵Zi Xuan Wu2019-04-122-2/+13
| | | | | | | | | | | | | are constant There are 3 operands of maddld, (add (mul %1, %2), %3) and sometimes they are constant. If there is constant operand, it takes extra li to materialize the operand, and one more extra register too. So it's not profitable to use maddld to optimize mul-add pattern. Differential Revision: https://reviews.llvm.org/D60181 llvm-svn: 358253
* [X86AsmPrinter] refactor static functions into private methods. NFCNick Desaulniers2019-04-112-77/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: A lot of the code for printing special cases of operands in this translation unit are static functions. While I too have suffered many years of abuse at the hands of C, we should prefer private methods, particularly when you start passing around *this as your first argument, which is a code smell. This will help make generic vs arch specific asm printing easier, as it brings X86AsmPrinter more in line with other arch's derived AsmPrinters. We will then be able to more easily move architecture generic code to the base class, and architecture specific code to the derived classes. Some other small refactorings while we're here: - the parameter Op is now consistently OpNo - add spaces around binary expressions. I know we're not millionaires but c'mon. Reviewers: echristo Reviewed By: echristo Subscribers: smeenai, hiraditya, llvm-commits, srhines, craig.topper Tags: #llvm Differential Revision: https://reviews.llvm.org/D60577 llvm-svn: 358236
* [AArch64][GlobalISel] Flesh out vector load/store support for more types.Amara Emerson2019-04-111-0/+8
| | | | | | | Some of these were legalizing into smaller vector types unnecessarily, others were simply not supported yet. llvm-svn: 358223
* [AArch64][GlobalISel] Legalization and ISel support for load/stores of ↵Amara Emerson2019-04-113-9/+67
| | | | | | | | | | | | | | | | vectors of pointers. Loads and store of values with type like <2 x p0> currently don't get imported because SelectionDAG has no knowledge of pointer types. To leverage the existing support for vector load/stores, we can bitcast the value to have s64 element types instead. We do this as a custom legalization. This patch also adds support for general loads of <2 x s64>, and relaxes some type conditions on selecting G_BITCAST. Differential Revision: https://reviews.llvm.org/D60534 llvm-svn: 358221
* [X86] Restrict vselect handling in scalarizeExtEltFP to only case to pre ↵Craig Topper2019-04-111-0/+4
| | | | | | | | | | type legalization where the setcc result type is vXi1. If the vector setcc has been legalized then we will need to convert a vector boolean of 0 or -1 to a scalar boolean of 0 or 1. The added test case previously crashed in 32-bit mode by creating a setcc with an i64 condition that type legalization couldn't expand. llvm-svn: 358218
* [X86] Add patterns for using movss/movsd for atomic load/store of f32/64. ↵Craig Topper2019-04-112-70/+51
| | | | | | | | | | | | | | Remove atomic fadd pseudos use isel patterns instead. This patch adds patterns for turning bitcasted atomic load/store into movss/sd. It also removes the pseudo instructions for atomic RMW fadd. Instead just adding isel patterns for folding an atomic load into addss/sd. And relying on the new movss/sd store pattern to handle the write part. This also makes the fadd patterns use VEX and EVEX instructions when AVX or AVX512F are enabled. Differential Revision: https://reviews.llvm.org/D60394 llvm-svn: 358215
* Recommit r358211 "[X86] Use FILD/FIST to implement i64 atomic load on 32-bit ↵Craig Topper2019-04-113-20/+75
| | | | | | | | | | | | targets with X87, but no SSE2" With correct test checks this time. If we have X87, but not SSE2 we can atomicaly load an i64 value into the significand of an 80-bit extended precision x87 register using fild. We can then use a fist instruction to convert it back to an i64 integ This matches what gcc and icc do for this case and removes an existing FIXME. llvm-svn: 358214
* Revert r358211 "[X86] Use FILD/FIST to implement i64 atomic load on 32-bit ↵Craig Topper2019-04-113-75/+20
| | | | | | | | targets with X87, but no SSE2" I seem to have messed up the test checks. llvm-svn: 358212
* [X86] Use FILD/FIST to implement i64 atomic load on 32-bit targets with X87, ↵Craig Topper2019-04-113-20/+75
| | | | | | | | | | | | but no SSE2 If we have X87, but not SSE2 we can atomicaly load an i64 value into the significand of an 80-bit extended precision x87 register using fild. We can then use a fist instruction to convert it back to an i64 integer and store it to a stack temporary. From there we can do two 32-bit loads to get the value into integer registers without worrying about atomicness. This matches what gcc and icc do for this case and removes an existing FIXME. Differential Revision: https://reviews.llvm.org/D60156 llvm-svn: 358211
* [X86] SimplifyDemandedVectorElts - add X86ISD::VPERMV3 mask supportSimon Pilgrim2019-04-111-1/+1
| | | | | | Completes SimplifyDemandedVectorElts's basic variable shuffle mask support which should help D60512 + D60562 llvm-svn: 358186
* [RISCV] Diagnose invalid second input register operand when using %tprel_addRoger Ferrer Ibanez2019-04-111-2/+26
| | | | | | | | | | | | | | | | | | | RISCVMCCodeEmitter::expandAddTPRel asserts that the second operand must be x4/tp. As we are not currently checking this in the RISCVAsmParser, the assert is easy to trigger due to wrong assembly input. This patch does a late check of this constraint. An alternative could be using a singleton register class for x4/tp similar to the current one for sp. Unfortunately it does not result in a good diagnostic. Because add is an overloaded mnemonic, if no matching is possible, the diagnostic of the first failing alternative seems to be used as the diagnostic itself. This means that this case the %tprel_add is diagnosed as an invalid operand (because the real add instruction only has 3 operands). Differential Revision: https://reviews.llvm.org/D60528 llvm-svn: 358183
* [X86] Add MM register mapping from CodeView to MC register idLuo, Yuanke2019-04-111-0/+9
| | | | | | | Differential Revision: https://reviews.llvm.org/D60437 Change-Id: I2183a6d825d0284b22705d423b88882992b236c5 llvm-svn: 358179
* [X86] SimplifyDemandedVectorElts - add X86ISD::VPERMV mask supportSimon Pilgrim2019-04-111-0/+8
| | | | llvm-svn: 358174
* [AArch64] Add lowering pattern for llvm.aarch64.neon.vcvtfxs2fp.f16.i64Diogo N. Sampaio2019-04-111-0/+2
| | | | | | | | | | | | | | | | Summary: Add lowering pattern for llvm.aarch64.neon.vcvtfxs2fp.f16.i64 Reviewers: pbarrio, DavidSpickett, LukeGeeson Reviewed By: LukeGeeson Subscribers: javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60259 llvm-svn: 358171
* [X86] SimplifyDemandedVectorElts - add X86ISD::VPERMILPV mask supportSimon Pilgrim2019-04-111-1/+2
| | | | llvm-svn: 358170
* [X86] SimplifyDemandedVectorElts - add X86ISD::VPERMIL2 mask supportSimon Pilgrim2019-04-111-2/+2
| | | | llvm-svn: 358167
OpenPOWER on IntegriCloud