summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Support for the mno-tls-direct-seg-refs flagKristina Brooks2018-10-181-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | Allows to disable direct TLS segment access (%fs or %gs). GCC supports a similar flag, it can be useful in some circumstances, e.g. when a thread context block needs to be updated directly from user space. More info and specific use cases: https://bugs.llvm.org/show_bug.cgi?id=16145 There is another revision for clang as well. Related: D53102 All X86 CodeGen tests appear to pass: ``` [46/47] Running lit suite /SourceCache/llvm-trunk-8.0/test/CodeGen Testing Time: 23.17s Expected Passes : 3801 Expected Failures : 15 Unsupported Tests : 8021 ``` Reviewed by: Craig Topper. Patch by nruslan (Ruslan Nikolaev). Differential Revision: https://reviews.llvm.org/D53103 llvm-svn: 344723
* AMDGPU: Avoid selecting ds_{read,write}2_b32 on SINicolai Haehnle2018-10-173-3/+26
| | | | | | | | | | | | | | | | | | | | | | Summary: To workaround a hardware issue in the (base + offset) calculation when base is negative. The impact on code quality should be limited since SILoadStoreOptimizer still runs afterwards and is able to combine loads/stores based on known sign information. This fixes visible corruption in Hitman on SI (easily reproducible by running benchmark mode). Change-Id: Ia178d207a5e2ac38ae7cd98b532ea2ae74704e5f Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99923 Reviewers: arsenm, mareko Subscribers: jholewinski, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53160 llvm-svn: 344698
* AMDGPU: Divergence-driven selection of scalar buffer load intrinsicsNicolai Haehnle2018-10-176-220/+90
| | | | | | | | | | | | | | | | | | | | | | | | Summary: Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if the load is really uniform. So select the scalar load intrinsics directly to either VMEM or SMRD buffer loads based on divergence analysis. If an offset happens to end up in a VGPR -- either because a floating point calculation was involved, or due to other remaining deficiencies in SIFixSGPRCopies -- we use v_readfirstlane. There is some unrelated churn in tests since we now select MUBUF offsets in a unified way with non-scalar buffer loads. Change-Id: I170e6816323beb1348677b358c9d380865cd1a19 Reviewers: arsenm, alex-t, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53283 llvm-svn: 344696
* [ARM] bottom-top mul support in ARMParallelDSPSam Parker2018-10-171-27/+194
| | | | | | | | | | | | | | Previously reverted in rL343082. Original commit message: On failing to find sequences that can be converted into dual macs, try to find sequential 16-bit loads that are used by muls which we can then use smultb, smulbt, smultt with a wide load. Differential Revision: https://reviews.llvm.org/D51983 llvm-svn: 344693
* AMDGPU: Remove dead TableGen codeNicolai Haehnle2018-10-171-2/+0
| | | | | | | | | | | | | Summary: Change-Id: Ic1f2c1d0cf9e90a0baa9fc6bacd0d3c386069fb0 Reviewers: tpr Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53318 Change-Id: Ib4d143c898801e5cf6cb9999a495d62c91ae77fb llvm-svn: 344691
* [MIPS GlobalISel] Legalize constantsPetar Jovanovic2018-10-171-1/+24
| | | | | | | | | | Legalize s1, s8, s16 and s64 G_CONSTANT for MIPS32. Patch by Petar Avramovic. Differential Revision: https://reviews.llvm.org/D53077 llvm-svn: 344684
* [ARM] Do not fuse VADD and VMUL, continued (2/2)Sjoerd Meijer2018-10-171-2/+4
| | | | | | | | | This is patch 2/2, following up on D53314, and is the functional change to prevent fusing mul + add sequences into VFMAs. Differential revision: https://reviews.llvm.org/D53315 llvm-svn: 344683
* [ARM] Follow up of rL344671, attempt to pacify a buildbotSjoerd Meijer2018-10-171-1/+1
| | | | | | It was rightfully complaining about an unpretty logical expression. llvm-svn: 344677
* [ARM][NFCI] Do not fuse VADD and VMUL, continued (1/2)Sjoerd Meijer2018-10-173-42/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a follow up of rL342874, which stopped fusing muls and adds into VMLAs for performance reasons on the Cortex-M4 and Cortex-M33. This is a serie of 2 patches, that is trying to achieve the same for VFMA. The second column in the table below shows what we were generating before rL342874, the third column what changed with rL342874, and the last column what we want to achieve with these 2 patches: -------------------------------------------------------- | Opt | < rL342874 | >= rL342874 | | |------------------------------------------------------| |-O3 | vmla | vmul | vmul | | | | vadd | vadd | |------------------------------------------------------| |-Ofast | vfma | vfma | vmul | | | | | vadd | |------------------------------------------------------| |-Oz | vmla | vmla | vmla | -------------------------------------------------------- This patch 1/2, is a cleanup of the spaghetti predicate logic on the different VMLA and VFMA codegen rules, so that we can make the final functional change in patch 2/2. This also fixes a typo in the regression test added in rL342874. Differential revision: https://reviews.llvm.org/D53314 llvm-svn: 344671
* [X86] Match (cmp (and (shr X, C), mask), 0) to BEXTR+TEST.Craig Topper2018-10-161-15/+32
| | | | | | | | | | Without this we match the CMP+AND to a TEST and then match the SHR separately. I'm trusting analyzeCompare to remove the TEST during the peephole pass. Otherwise we need to check the flag users to see if they only use the Z flag. This recovers a case lost by r344270. Differential Revision: https://reviews.llvm.org/D53310 llvm-svn: 344649
* Revert "[WebAssembly] LSDA info generation"Krasimir Georgiev2018-10-163-21/+3
| | | | | | | | This reverts commit r344575. Newly introduced test eh-lsda.ll.test fails with use-after-free under ASAN build. llvm-svn: 344639
* [PATCH] [NFC][AArch64] Fix refactoring of macro fusionEvandro Menezes2018-10-161-8/+4
| | | | | | Fix compiler error. llvm-svn: 344632
* [NFC][ARM] Refactor macro fusionEvandro Menezes2018-10-161-19/+5
| | | | | | Simplify code for wildcards. llvm-svn: 344625
* [NFC][AArch64] Refactor macro fusionEvandro Menezes2018-10-161-76/+90
| | | | | | Simplify API of checking functions. llvm-svn: 344624
* [X86] Fix Skylake ReadAfterLd for PADDrm etc.Simon Pilgrim2018-10-162-4/+8
| | | | | | Missed in rL343868 as due to their custom InstrRW. llvm-svn: 344600
* [mips][micromips] Fix how values in .gcc_except_table are calculatedAleksandar Beserminji2018-10-162-0/+10
| | | | | | | | | | | | | When a landing pad is calculated in a program that is compiled for micromips, it will point to an even address. Such an error will cause a segmentation fault, as the instructions in micromips are aligned on odd addresses. This patch sets the last bit of the offset where a landing pad is, to 1, which will effectively be an odd address and point to the instruction exactly. Differential Revision: https://reviews.llvm.org/D52985 llvm-svn: 344591
* [WebAssembly] LSDA info generationHeejin Ahn2018-10-163-3/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This adds support for LSDA (exception table) generation for wasm EH. Wasm EH mostly follows the structure of Itanium-style exception tables, with one exception: a call site table entry in wasm EH corresponds to not a call site but a landing pad. In wasm EH, the VM is responsible for stack unwinding. After an exception occurs and the stack is unwound, the control flow is transferred to wasm 'catch' instruction by the VM, after which the personality function is called from the compiler-generated code. (Refer to WasmEHPrepare pass for more information on this part.) This patch: - Changes wasm.landingpad.index intrinsic to take a token argument, to make this 1:1 match with a catchpad instruction - Stores landingpad index info and catch type info MachineFunction in before instruction selection - Lowers wasm.lsda intrinsic to an MCSymbol pointing to the start of an exception table - Adds WasmException class with overridden methods for table generation - Adds support for LSDA section in Wasm object writer Reviewers: dschuff, sbc100, rnk Subscribers: mgorny, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52748 llvm-svn: 344575
* [X86] Remove some isel patterns that shouldn't be possible.Craig Topper2018-10-152-6/+0
| | | | | | These included a bitcast of a load from v4f32 to v2f64, but DAG combine should have already changed the type of the load to remove the cast. llvm-svn: 344573
* [X86] Fix a bad bitcast in the load form of vXi16 uniform shift patterns for ↵Craig Topper2018-10-151-9/+10
| | | | | | EVEX encoded instructions. llvm-svn: 344563
* [AARCH64] Improve vector popcnt lowering with ADDLPSimon Pilgrim2018-10-151-12/+36
| | | | | | | | | | AARCH64 equivalent to D53257 - uses widening pairwise adds on vXi8 CTPOP to support i16/i32/i64 vectors. This is a blocker for generic vector CTPOP expansion (P32655) - this will remove the aarch64 diff from D53258. Differential Revision: https://reviews.llvm.org/D53259 llvm-svn: 344554
* AMDGPU: Generate .amdgcn_target for object code v3Konstantin Zhuravlyov2018-10-151-3/+10
| | | | | | Differential Revision: https://reviews.llvm.org/D53221 llvm-svn: 344552
* [mips][micromips] Fix overlaping FDEs errorAleksandar Beserminji2018-10-152-0/+24
| | | | | | | | | | | | | | When compiling static executable for micromips, CFI symbols are incorrectly labeled as MICROMIPS, which cause ".eh_frame_hdr refers to overlapping FDEs." error. This patch does not label CFI symbols as MICROMIPS, and FDEs do not overlap anymore. This patch also exposes another bug, which is fixed here: https://reviews.llvm.org/D52985 Differential Revision: https://reviews.llvm.org/D52987 llvm-svn: 344516
* [mips][micromips] Revert "Fix overlaping FDEs error"Aleksandar Beserminji2018-10-152-24/+0
| | | | | | This reverts r344511. llvm-svn: 344515
* [ARM][NEON] Improve vector popcnt lowering with PADDL (PR39281)Simon Pilgrim2018-10-151-130/+26
| | | | | | | | | | As I suggested on PR39281, this patch uses PADDL pairwise addition to widen from the vXi8 CTPOP result to the target vector type. This is a blocker for moving more x86 code to generic vector CTPOP expansion (P32655 + D53258) - ARM's vXi64 CTPOP currently expands, which would generate a vXi64 MUL but ARM's custom lowering expands the general MUL case and vectors aren't well handled in LegalizeDAG - improving the CTPOP lowering was a lot easier than fixing the MUL lowering for this one case...... Differential Revision: https://reviews.llvm.org/D53257 llvm-svn: 344512
* [mips][micromips] Fix overlaping FDEs errorAleksandar Beserminji2018-10-152-0/+24
| | | | | | | | | | | | | | When compiling static executable for micromips, CFI symbols are incorrectly labeled as MICROMIPS, which cause ".eh_frame_hdr refers to overlapping FDEs." error. This patch does not label CFI symbols as MICROMIPS, and FDEs do not overlap anymore. This patch also exposes another bug, which is fixed here: https://reviews.llvm.org/D52985 Differential Revision: https://reviews.llvm.org/D52987 llvm-svn: 344511
* [TI removal] Make variables declared as `TerminatorInst` and initializedChandler Carruth2018-10-155-5/+5
| | | | | | | | | | | | | by `getTerminator()` calls instead be declared as `Instruction`. This is the biggest remaining chunk of the usage of `getTerminator()` that insists on the narrow type and so is an easy batch of updates. Several files saw more extensive updates where this would cascade to requiring API updates within the file to use `Instruction` instead of `TerminatorInst`. All of these were trivial in nature (pervasively using `Instruction` instead just worked). llvm-svn: 344502
* [X86] Move promotion of vector and/or/xor from legalization to DAG combineCraig Topper2018-10-151-17/+34
| | | | | | | | | | | | | | | | | | | Summary: I've noticed that the bitcasts we introduce for these make computeKnownBits and computeNumSignBits not work well in LegalizeVectorOps. LegalizeVectorOps legalizes bottom up while LegalizeDAG legalizes top down. The bottom up strategy for LegalizeVectorOps means operands are legalized before their uses. So we promote and/or/xor before we legalize the operands that use them making computeKnownBits/computeNumSignBits in places like LowerTruncate suboptimal. I looked at changing LegalizeVectorOps to be top down as well, but that was more disruptive and caused some regressions. I also looked at just moving promotion of binops to LegalizeDAG, but that had a few issues one around matching AND,ANDN,OR into VSELECT because I had to create ANDN as vXi64, but the other nodes hadn't legalized yet, I didn't look too hard at fixing that. This patch seems to produce better results overall than my other attempts. We now form broadcasts of constants better in some cases. For at least some of them the AND was being introduced in LegalizeDAG, promoted to vXi64, and the BUILD_VECTOR was also legalized there. I think we got bad ordering of that. Now the promotion is out of the legalizer so we handle this better. In the longer term I think we really should evaluate whether we should be doing this promotion at all. It's really there to reduce isel pattern count, but I'm wondering if we'd be better served just eating the pattern cost or doing C++ based isel for vector and/or/xor in X86ISelDAGToDAG. The masked and/or/xor will definitely be difficult in patterns if a bitcast gets between the vselect and the and/or/xor node. That becomes a lot of permutations to cover. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53107 llvm-svn: 344487
* [X86] Add 128 MOVDDUP to the constant pool printing in ↵Craig Topper2018-10-151-0/+6
| | | | | | | | X86AsmPrinter::EmitInstruction. We use this instruction to broadcast a single 64-bit value to a v2i64/v2f64 vector. llvm-svn: 344486
* [X86][AVX] Enable lowerVectorShuffleAsLanePermuteAndPermute v16i16/v32i8 ↵Simon Pilgrim2018-10-141-0/+10
| | | | | | | | shuffle lowering Extends D53148 from v4f64 now that we have test coverage for v16i16/v32i8 shuffles. llvm-svn: 344481
* recommit 344472 after fixing build failure on ARM and PPC.Dorit Nuzman2018-10-1412-25/+52
| | | | llvm-svn: 344475
* revert 344472 due to failures.Dorit Nuzman2018-10-1412-52/+25
| | | | llvm-svn: 344473
* [IAI,LV] Add support for vectorizing predicated strided accesses using maskedDorit Nuzman2018-10-1412-25/+52
| | | | | | | | | | | | | | | | | | | | | | | interleave-group The vectorizer currently does not attempt to create interleave-groups that contain predicated loads/stores; predicated strided accesses can currently be vectorized only using masked gather/scatter or scalarization. This patch makes predicated loads/stores candidates for forming interleave-groups during the Loop-Vectorizer's analysis, and adds the proper support for masked-interleave- groups to the Loop-Vectorizer's planning and transformation stages. The patch also extends the TTI API to allow querying the cost of masked interleave groups (which each target can control); Targets that support masked vector loads/ stores may choose to enable this feature and allow vectorizing predicated strided loads/stores using masked wide loads/stores and shuffles. Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53011 llvm-svn: 344472
* [X86] Fix bad indentation. NFCCraig Topper2018-10-141-1/+1
| | | | llvm-svn: 344471
* [X86] Type legalize v2f32 stores by widening to v4f32, casting to v2f64, ↵Craig Topper2018-10-141-13/+34
| | | | | | | | | | | | | | | | extracting f64 and storing. Summary: This is similar to what D52528 did for loads. It should match what generic type legalization does in 64-bit mode where it uses a v2i64 cast and an i64 store. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53173 llvm-svn: 344470
* Move some helpers from the global namespace into anonymous ones.Benjamin Kramer2018-10-133-5/+7
| | | | llvm-svn: 344468
* [WebAssembly][NFC] Fix signed/unsigned comparison warningThomas Lively2018-10-131-1/+3
| | | | llvm-svn: 344459
* [X86][SSE] Remove most of vector CTTZ custom lowering and use LegalizeDAG ↵Simon Pilgrim2018-10-131-28/+7
| | | | | | | | | | instead. There is one remnant - AVX1 custom splitting of 256-bit vectors - which is due to a regression where the X86ISD::ANDNP is still performed as a YMM. I've also tightened the CTLZ or CTPOP lowering in SelectionDAGLegalize::ExpandBitCount to require a legal CTLZ - it doesn't affect existing users and fixes an issue with AVX512 codegen. llvm-svn: 344457
* [X86][SSE] Begin removing vector CTTZ custom lowering and use LegalizeDAG ↵Simon Pilgrim2018-10-131-9/+8
| | | | | | | | instead. Adds CTTZ vector legalization support and begins the removal of the X86/SSE custom lowering. llvm-svn: 344453
* [X86][SSE] combineIncDecVector - use isConstantSplatSimon Pilgrim2018-10-131-3/+1
| | | | | | Use isConstantSplat instead of ISD::isConstantSplatVector to let us us peek through to illegal types (in this case for i686 targets to recognise i64 constants) llvm-svn: 344452
* [X86] Pull out target constant splat helper function. NFCI.Simon Pilgrim2018-10-131-17/+27
| | | | | | The code in LowerScalarImmediateShift is just a more powerful version of ISD::isConstantSplatVector. llvm-svn: 344451
* Pull out repeated getOperand(). NFCI.Simon Pilgrim2018-10-131-3/+2
| | | | llvm-svn: 344450
* Remove unused variable. NFCI.Simon Pilgrim2018-10-131-1/+0
| | | | llvm-svn: 344449
* [X86][SSE] Improve CTTZ lowering when CTLZ is legalSimon Pilgrim2018-10-131-11/+13
| | | | | | | | If we have better CTLZ support than CTPOP, then use cttz(x) = width - ctlz(~x & (x - 1)) - and remove the CTTZ_ZERO_UNDEF handling as it no longer gives better codegen. Similar to rL344447, this is also closer to LegalizeDAG's approach llvm-svn: 344448
* [X86][SSE] Change CTTZ vector lowering to cttz(x) = ctpop(~x & (x - 1))Simon Pilgrim2018-10-131-8/+12
| | | | | | | | | | | | | | | | This patch changes the vector CTTZ lowering from: cttz(x) = ctpop((x & -x) - 1) to: cttz(x) = ctpop(~x & (x - 1)) Not only does this make better use of the PANDN instruction, but it also matches the LegalizeDAG method which should allow us to remove the x86 specific code at some point in the future (we need to fix some issues with the bitcasted logic ops and CTPOP lowering first). Differential Revision: https://reviews.llvm.org/D53214 llvm-svn: 344447
* [X86][AVX] Add lowerVectorShuffleAsLanePermuteAndPermute for v4f64 shuffles ↵Simon Pilgrim2018-10-131-0/+60
| | | | | | | | | | | | | | (PR39161) Add shuffle lowering for the case where we can shuffle the lanes into place followed by an in-lane permute. This is mainly for cases where we can have non-repeating permutes in each lane, but for now I've just enabled it for v4f64 unary shuffles to fix PR39161 - there is no test coverage for other shuffles that might benefit yet. We now have several cross-lane shuffle lowering methods that all do something similar - I've looked at merging some of these (notably by making the repeated mask mechanism in lowerVectorShuffleByMerging128BitLanes optional), but there is a lot of assertions/assumptions in the way that makes this tricky - I ended up going for adding yet another relatively simple method instead. Differential Revision: https://reviews.llvm.org/D53148 llvm-svn: 344446
* [AArch64] Swap comparison operands if that enables some folding.Arnaud A. de Grandmaison2018-10-131-12/+74
| | | | | | | | | | | | | | | | Summary: AArch64 can fold some shift+extend operations on the RHS operand of comparisons, so swap the operands if that makes sense. This provides a fix for https://bugs.llvm.org/show_bug.cgi?id=38751 Reviewers: efriedma, t.p.northover, javed.absar Subscribers: mcrosier, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D53067 llvm-svn: 344439
* [WebAssembly] SIMD min and maxThomas Lively2018-10-131-7/+7
| | | | | | | | | | | | Summary: Depends on D52324 and D52764. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52325 llvm-svn: 344438
* [WebAssembly][NFC] Unify ARGUMENT classesThomas Lively2018-10-135-45/+36
| | | | | | | | | | Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53172 llvm-svn: 344436
* [RISCV] Eliminate unnecessary masking of promoted shift amountsAlex Bradbury2018-10-121-3/+20
| | | | | | | | | | | | | | | | SelectionDAGBuilder::visitShift will always zero-extend a shift amount when it is promoted to the ShiftAmountTy. This results in zero-extension (masking) which is unnecessary for RISC-V as the shift operations only read the lower 5 or 6 bits (RV32 or RV64). I initially proposed adding a getExtendForShiftAmount hook so the shift amount can be any-extended (D52975). @efriedma explained this was unsafe, so I have instead eliminate the unnecessary and operations at instruction selection time in a manner similar to X86InstrCompiler.td. Differential Revision: https://reviews.llvm.org/D53224 llvm-svn: 344432
* [X86] Improve type legalization of (v2i32/v4i16/v8i16 (bitcast (v2f32))) to ↵Craig Topper2018-10-121-7/+13
| | | | | | avoid a stack stack temporary. llvm-svn: 344425
OpenPOWER on IntegriCloud