summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Avoid uses of getZextValue(). NFCI.Simon Pilgrim2019-09-071-22/+19
| | | | | | | | Use getAPIntValue() directly - this is mainly a best practice style issue to help prevent fuzz tests blowing up when a i12345 (or whatever) is generated. Use getConstantOperandVal/getConstantOperandAPInt wrappers where possible. llvm-svn: 371315
* [X86] Fix pshuflw formation from repeated shuffle mask (PR43230)Nikita Popov2019-09-071-2/+2
| | | | | | | | | | | | Fix for https://bugs.llvm.org/show_bug.cgi?id=43230. When creating PSHUFLW from a repeated shuffle mask, we have to apply the checks to the repeated mask, not the original one. For the test case from PR43230 the inspected part of the original mask is all undef. Differential Revision: https://reviews.llvm.org/D67314 llvm-svn: 371307
* Fix MSVC "32-bit shift implicitly converted to 64 bits" warnings. NFCI.Simon Pilgrim2019-09-072-2/+2
| | | | llvm-svn: 371302
* Replicate the change "[Alignment][NFC] Use Align with ↵Sylvestre Ledru2019-09-071-1/+1
| | | | | | | | | TargetLowering::setMinFunctionAlignment" on AVR to avoid a breakage. See r371200 / https://reviews.llvm.org/D67229 llvm-svn: 371293
* Change TargetLibraryInfo analysis passes to always require FunctionTeresa Johnson2019-09-075-12/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is the first change to enable the TLI to be built per-function so that -fno-builtin* handling can be migrated to use function attributes. See discussion on D61634 for background. This is an enabler for fixing handling of these options for LTO, for example. This change should not affect behavior, as the provided function is not yet used to build a specifically per-function TLI, but rather enables that migration. Most of the changes were very mechanical, e.g. passing a Function to the legacy analysis pass's getTLI interface, or in Module level cases, adding a callback. This is similar to the way the per-function TTI analysis works. There was one place where we were looking for builtins but not in the context of a specific function. See FindCXAAtExit in lib/Transforms/IPO/GlobalOpt.cpp. I'm somewhat concerned my workaround could provide the wrong behavior in some corner cases. Suggestions welcome. Reviewers: chandlerc, hfinkel Subscribers: arsenm, dschuff, jvesely, nhaehnle, mehdi_amini, javed.absar, sbc100, jgravelle-google, eraman, aheejin, steven_wu, george.burgess.iv, dexonsmith, jfb, asbirlea, gchatelet, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66428 llvm-svn: 371284
* [AArch64][GlobalISel] Enable the localizer for optimized builds.Amara Emerson2019-09-061-3/+1
| | | | | | | | | | | | | | | | | | | Despite the fact that the localizer's original motivation was to fix horrendous constant spilling at -O0, shortening live ranges still has net benefits even with optimizations enabled. On an -Os build of CTMark, doing this improves code size by 0.5% geomean. There are a few regressions, bullet increasing in size by 0.5%. One example from bullet where code size increased slightly was due to GlobalISel actually now generating the same code as SelectionDAG. So we actually have an opportunity in future to implement better heuristics for localization and therefore be *better* than SDAG in some cases. In relation to other optimizations though that one is relatively minor. Differential Revision: https://reviews.llvm.org/D67303 llvm-svn: 371266
* GlobalISel: Support physical register inputs in patternsMatt Arsenault2019-09-061-5/+7
| | | | llvm-svn: 371253
* Remove dead .seh_stackalloc parsing method in X86AsmParserReid Kleckner2019-09-061-14/+0
| | | | | | | The shared COFF asm parser code handles this directive, since it is shared with AArch64. Spotted by Alexandre Ganea in review. llvm-svn: 371251
* AMDGPU: Fix typoMatt Arsenault2019-09-061-4/+4
| | | | llvm-svn: 371249
* [X86] Use MOVSX by default instead of CBW to extend i8 to AX for i8 sdivrem.Craig Topper2019-09-061-5/+8
| | | | | | | | | | | | | | | | | We can use a MOVSX16 here then rely on FixupBWInst to change to MOVSX32 if the upper bits are dead. With a special case to not promote if it could be turned into CBW. Then we can rely on X86MCInstLower to turn the MOVSX into CBW very late if register allocation worked out. Using MOVSX gives an opportunity to use the MOVSX as a both a copy and a sign extend since the input and output register aren't tied together. Differential Revision: https://reviews.llvm.org/D67192 llvm-svn: 371243
* [X86] Use MOVZX16rr8/MOVZXrm8 when extending input for i8 udivrem.Craig Topper2019-09-061-3/+3
| | | | | | | | We can rely on X86FixupBWInsts to turn these into MOVZX32. This simplifies a follow up commit to use MOVSX for i8 sdivrem with a late optimization to use CBW when register allocation works out. llvm-svn: 371242
* [X86] Teach FixupBWInsts to turn MOVSX16rr8/MOVZX16rr8/MOVSX16rm8/MOVZX16rm8 ↵Craig Topper2019-09-061-6/+48
| | | | | | into their 32-bit dest equivalents when the upper part of the register is dead. llvm-svn: 371240
* [ARM] Add patterns for VSUB with q and r registersOliver Cruickshank2019-09-061-0/+9
| | | | | | | Added patterns for VSUB to support q and r registers, which reduces pressure on q registers. llvm-svn: 371231
* [ARM] Add patterns for VADD with q and r registersOliver Cruickshank2019-09-061-0/+9
| | | | | | | Added support for VADD to use q and r registers, which reduces pressure on q registers. llvm-svn: 371230
* [ARM] Add patterns for VMUL with q and r registersOliver Cruickshank2019-09-061-0/+9
| | | | | | | Added support for VMUL to use an r register, this reduces pressure on the q registers. llvm-svn: 371229
* [AArch64][GlobalISel] Always fall back on tail calls with -tailcalloptJessica Paquette2019-09-061-0/+6
| | | | | | | | | | | | | | | | | | -tailcallopt requires that we perform different stack adjustments than with sibling calls. For example, the `@caller_to0_from8` function in test/CodeGen/AArch64/tail-call.ll requires that we adjust SP. Without -tailcallopt, this adjustment does not happen. With it, however, it is expected. So, to ensure that adding sibling call support doesn't break -tailcallopt, make CallLowering always fall back on possible tail calls when -tailcallopt is passed in. Update test/CodeGen/AArch64/tail-call.ll with a GlobalISel line to make sure that we don't differ from the SDAG implementation at any point. Differential Revision: https://reviews.llvm.org/D67245 llvm-svn: 371227
* [ARM] Sink add/mul(shufflevector(insertelement())) for MVE instruction selectionSam Tebbs2019-09-061-10/+48
| | | | | | | | | | | | This patch sinks add/mul(shufflevector(insertelement())) into the basic block in which they are used so that they can then be selected together. This is useful for various MVE instructions, such as vmla and others that take R registers. Loop tests have been added to the vmla test file to make sure vmlas are generated in loops. Differential revision: https://reviews.llvm.org/D66295 llvm-svn: 371218
* [AMDGPU] Enable constant offset promotion to immediate operand for VMEM storesValery Pykhtin2019-09-061-4/+5
| | | | | | Differential revision: https://reviews.llvm.org/D66958 llvm-svn: 371214
* [Alignment][NFC] Use Align with TargetLowering::setPrefFunctionAlignmentGuillaume Chatelet2019-09-0610-14/+15
| | | | | | | | | | | | | | | | | Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: nemanjai, javed.absar, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, s.egerton, pzheng, ychen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67267 llvm-svn: 371212
* [Alignment][NFC] Use Align with TargetLowering::setPrefLoopAlignmentGuillaume Chatelet2019-09-065-8/+10
| | | | | | | | | | | | | | | | | Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, ychen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67278 llvm-svn: 371210
* [Alignment] fix dubious min function alignmentGuillaume Chatelet2019-09-061-1/+1
| | | | | | | | | | | | | | | | Summary: This was discovered while introducing the llvm::Align type. The original setMinFunctionAlignment used to take alignment as log2, looking at the comment it seems like instructions are to be 2-bytes aligned and not 4-bytes aligned. Reviewers: uweigand Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67271 llvm-svn: 371204
* [Alignment][NFC] Use Align with TargetLowering::setMinFunctionAlignmentGuillaume Chatelet2019-09-0612-14/+16
| | | | | | | | | | | | | | | | | Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: jyknight, sdardis, nemanjai, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, s.egerton, pzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67229 llvm-svn: 371200
* [DFAPacketizer] Track resources for packetized instructionsJames Molloy2019-09-061-0/+11
| | | | | | | | | | | | | | | | | | | | | | This patch allows the DFAPacketizer to be queried after a packet is formed to work out which resources were allocated to the packetized instructions. This is particularly important for targets that do their own bundle packing - it's not sufficient to know simply that instructions can share a packet; which slots are used is also required for encoding. This extends the emitter to emit a side-table containing resource usage diffs for each state transition. The packetizer maintains a set of all possible resource states in its current state. After packetization is complete, all remaining resource states are possible packetization strategies. The sidetable is only ~500K for Hexagon, but the extra tracking is disabled by default (most uses of the packetizer like MachinePipeliner don't care and don't need the extra maintained state). Differential Revision: https://reviews.llvm.org/D66936 llvm-svn: 371198
* [AMDGPU] Mark s_barrier as having side effects but not accessing memory.Jay Foad2019-09-061-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: This fixes poor scheduling in a function containing a barrier and a few load instructions. Without this fix, ScheduleDAGInstrs::buildSchedGraph adds an artificial edge in the dependency graph from the barrier instruction to the exit node representing live-out latency, with a latency of about 500 cycles. Because of this it thinks the critical path through the graph also has a latency of about 500 cycles. And because of that it does not think that any of the load instructions are on the critical path, so it schedules them with no regard for their (80 cycle) latency, which gives poor results. Reviewers: arsenm, dstuttard, tpr, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67218 llvm-svn: 371192
* [ARM] MVE Tail PredicationSam Parker2019-09-064-1/+476
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The MVE and LOB extensions of Armv8.1m can be combined to enable 'tail predication' which removes the need for a scalar remainder loop after vectorization. Lane predication is performed implicitly via a system register. The effects of predication is described in Section B5.6.3 of the Armv8.1-m Arch Reference Manual, the key points being: - For vector operations that perform reduction across the vector and produce a scalar result, whether the value is accumulated or not. - For non-load instructions, the predicate flags determine if the destination register byte is updated with the new value or if the previous value is preserved. - For vector store instructions, whether the store occurs or not. - For vector load instructions, whether the value that is loaded or whether zeros are written to that element of the destination register. This patch implements a pass that takes a hardware loop, containing masked vector instructions, and converts it something that resembles an MVE tail predicated loop. Currently, if we had code generation, we'd generate a loop in which the VCTP would generate the predicate and VPST would then setup the value of VPR.PO. The loads and stores would be placed in VPT blocks so this is not tail predication, but normal VPT predication with the predicate based upon a element counting induction variable. Further work needs to be done to finally produce a true tail predicated loop. Because only the loads and stores are predicated, in both the LLVM IR and MIR level, we will restrict support to only lane-wise operations (no horizontal reductions). We will perform a final check on MIR during loop finalisation too. Another restriction, specific to MVE, is that all the vector instructions need operate on the same number of elements. This is because predication is performed at the byte level and this is set on entry to the loop, or by the VCTP instead. Differential Revision: https://reviews.llvm.org/D65884 llvm-svn: 371179
* [X86] Fix bad indentation. NFCCraig Topper2019-09-061-1/+1
| | | | llvm-svn: 371167
* AMDGPU/GlobalISel: Avoid repeating 32-bit type listsMatt Arsenault2019-09-064-6/+14
| | | | llvm-svn: 371156
* AMDGPU/GlobalISel: Fix load/store of types in other address spacesMatt Arsenault2019-09-062-5/+26
| | | | | | There should probably be a size only matcher. llvm-svn: 371155
* AMDGPU: Allow getMemOperandWithOffset to analyze stack accessesMatt Arsenault2019-09-051-2/+19
| | | | | | | Report soffset as a base register if the scratch resource can be ignored. llvm-svn: 371149
* AMDGPU: Fix emitting multiple stack loads for stack passed workitemsMatt Arsenault2019-09-051-1/+15
| | | | | | | | | | The same stack is loaded for each workitem ID, and each use. Nothing prevents you from creating multiple fixed stack objects with the same offsets, so this was creating a load for each unique frame index, despite them being the same offset. Re-use the same frame index so the loads are CSEable. llvm-svn: 371148
* AMDGPU: Fix Register copypaste errorMatt Arsenault2019-09-051-2/+2
| | | | llvm-svn: 371141
* AMDGPU: Avoid constructing new std::vector in initCandidateMatt Arsenault2019-09-052-2/+5
| | | | | | | | | | | | Approximately 30% of the time was spent in the std::vector constructor. In one testcase this pushes the scheduler to being the second slowest pass. I'm not sure I understand why these vector are necessary. The default scheduler initCandidate seems to use some pre-existing vectors for the pressure. llvm-svn: 371136
* Recommit "[AArch64][GlobalISel] Teach AArch64CallLowering to handle basic ↵Jessica Paquette2019-09-052-7/+190
| | | | | | | | | | | | | | | | | | sibling calls" Recommit basic sibling call lowering (https://reviews.llvm.org/D67189) The issue was that if you have a return type other than void, call lowering will emit COPYs to get the return value after the call. Disallow sibling calls other than ones that return void for now. Also proactively disable swifterror tail calls for now, since there's a similar issue with COPYs there. Update call-translator-tail-call.ll to include test cases for each of these things. llvm-svn: 371114
* [X86] Enable BuildSDIVPow2 for i16.Craig Topper2019-09-051-2/+3
| | | | | | | We're able to use a 32-bit ADD and CMOV here and should work well with our other i16->i32 promotion optimizations. llvm-svn: 371107
* [X86] Override BuildSDIVPow2 for X86.Craig Topper2019-09-052-0/+58
| | | | | | | | | | | | | | | | | | | | | | | As noted in PR43197, we can use test+add+cmov+sra to implement signed division by a power of 2. This is based off the similar version in AArch64, but I've adjusted it to use target independent nodes where AArch64 uses target specific CMP and CSEL nodes. I've also blocked INT_MIN as the transform isn't valid for that. I've limited this to i32 and i64 on 64-bit targets for now and only when CMOV is supported. i8 and i16 need further investigation to be sure they get promoted to i32 well. I adjusted a few tests to enable cmov to demonstrate the new codegen. I also changed twoaddr-coalesce-3.ll to 32-bit mode without cmov to avoid perturbing the scenario that is being set up there. Differential Revision: https://reviews.llvm.org/D67087 llvm-svn: 371104
* [x86] fix horizontal math bug exposed by improved demanded elements analysis ↵Sanjay Patel2019-09-051-5/+24
| | | | | | | | (PR43225) https://bugs.llvm.org/show_bug.cgi?id=43225 llvm-svn: 371095
* [X86] Add a FIXME about why the CWD/CDQ/CQO have a bogus implicit def of the ↵Craig Topper2019-09-051-6/+5
| | | | | | | | | A register. NFC The instructions copy the sign bit of the A register to every bit of the D register. But they don't write to the A register. llvm-svn: 371094
* [X86] Fix stale comment. NFCCraig Topper2019-09-051-2/+2
| | | | | | | We aren't checking for a concat here. We're just always splitting 256-bit stores. llvm-svn: 371092
* [Hexagon] Fix type in HexagonTargetLowering::ReplaceNodeResultsKrzysztof Parzyszek2019-09-051-1/+2
| | | | llvm-svn: 371083
* [ARM] Add support for the s,j,x,N,O inline asm constraintsDavid Candler2019-09-051-3/+3
| | | | | | | | | | | | | | | | | | | | | | A number of inline assembly constraints are currently supported by LLVM, but rejected as invalid by Clang: Target independent constraints: s: An integer constant, but allowing only relocatable values ARM specific constraints: j: An immediate integer between 0 and 65535 (valid for MOVW) x: A 32, 64, or 128-bit floating-point/SIMD register: s0-s15, d0-d7, or q0-q3 N: An immediate integer between 0 and 31 (Thumb1 only) O: An immediate integer which is a multiple of 4 between -508 and 508. (Thumb1 only) This patch adds support to Clang for the missing constraints along with some checks to ensure that the constraints are used with the correct target and Thumb mode, and that immediates are within valid ranges (at least where possible). The constraints are already implemented in LLVM, but just a couple of minor corrections to checks (V8M Baseline includes MOVW so should work with 'j', 'N' and 'O' shouldn't be valid in Thumb2) so that Clang and LLVM are in line with each other and the documentation. Differential Revision: https://reviews.llvm.org/D65863 Change-Id: I18076619e319bac35fbb60f590c069145c9d9a0a llvm-svn: 371079
* [X86][SSE] EltsFromConsecutiveLoads - ignore non-zero offset base loads ↵Simon Pilgrim2019-09-051-0/+4
| | | | | | | | | | (PR43227) As discussed on D64551 and PR43227, we don't correctly handle cases where the base load has a non-zero byte offset. Until we can properly handle this, we must bail from EltsFromConsecutiveLoads. llvm-svn: 371078
* [ARM] Fixup the creation of VPT blocksDavid Green2019-09-051-15/+20
| | | | | | | | | | This attempts to just fix the creation of VPT blocks, fixing up the iterating, which instructions are considered in the bundle, and making sure that we do not overrun the end of the block. Differential Revision: https://reviews.llvm.org/D67219 llvm-svn: 371064
* [LLVM][Alignment] Convert isLegalNTStore/isLegalNTLoad to llvm::AlignGuillaume Chatelet2019-09-052-4/+4
| | | | | | | | | | | | | | | | | Summary: This is patch is part of a serie to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67223 llvm-svn: 371063
* [MIPS GlobalISel] Select G_FENCEPetar Avramovic2019-09-051-0/+4
| | | | | | | | | | | | G_FENCE comes form fence instruction. For MIPS fence is generated in AtomicExpandPass when atomic instruction gets surrounded with fence instruction when needed. G_FENCE arguments don't have LLT, because of that there is no job for legalizer and regbankselect. Instruction select G_FENCE for MIPS32. Differential Revision: https://reviews.llvm.org/D67181 llvm-svn: 371056
* [MIPS GlobalISel] Select llvm.trap intrinsicPetar Avramovic2019-09-051-1/+14
| | | | | | | | | Select G_INTRINSIC_W_SIDE_EFFECTS for Intrinsic::trap for MIPS32 via legalizeIntrinsic. Differential Revision: https://reviews.llvm.org/D67180 llvm-svn: 371055
* [MIPS GlobalISel] Lower SRet pointer argumentsPetar Avramovic2019-09-051-1/+3
| | | | | | | | | | | Instead of returning structure by value clang usually adds pointer to that structure as an argument. Pointers don't require special handling no matter the SRet flag. Remove unsuccessful exit from lowerCall for arguments with SRet flag if they are pointers. Differential Revision: https://reviews.llvm.org/D67179 llvm-svn: 371054
* Revert rL370996 from llvm/trunk: [AArch64][GlobalISel] Teach ↵Simon Pilgrim2019-09-052-173/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | AArch64CallLowering to handle basic sibling calls This adds support for basic sibling call lowering in AArch64. The intent here is to only handle tail calls which do not change the ABI (hence, sibling calls.) At this point, it is very restricted. It does not handle - Vararg calls. - Calls with outgoing arguments. - Calls whose calling conventions differ from the caller's calling convention. - Tail/sibling calls with BTI enabled. This patch adds - `AArch64CallLowering::isEligibleForTailCallOptimization`, which is equivalent to the same function in AArch64ISelLowering.cpp (albeit with the restrictions above.) - `mayTailCallThisCC` and `canGuaranteeTCO`, which are identical to those in AArch64ISelLowering.cpp. - `getCallOpcode`, which is exactly what it sounds like. Tail/sibling calls are lowered by checking if they pass target-independent tail call positioning checks, and checking if they satisfy `isEligibleForTailCallOptimization`. If they do, then a tail call instruction is emitted instead of a normal call. If we have a sibling call (which is always the case in this patch), then we do not emit any stack adjustment operations. When we go to lower a return, we check if we've already emitted a tail call. If so, then we skip the return lowering. For testing, this patch - Adds call-translator-tail-call.ll to test which tail calls we currently lower, which ones we don't, and which ones we shouldn't. - Updates branch-target-enforcement-indirect-calls.ll to show that we fall back as expected. Differential Revision: https://reviews.llvm.org/D67189 ........ This fails on EXPENSIVE_CHECKS builds due to a -verify-machineinstrs test failure in CodeGen/AArch64/dllimport.ll llvm-svn: 371051
* [X86] X86SpeculativeLoadHardeningPass::canHardenRegister - fix out of bounds ↵Simon Pilgrim2019-09-051-2/+5
| | | | | | | | warning. Fixes clang static-analyzer warning. llvm-svn: 371050
* [SystemZ] Recognize INLINEASM_BR in backendJonas Paulsson2019-09-054-18/+32
| | | | | | | | | | Handle the remaining cases also by handling asm goto in SystemZInstrInfo::getBranchInfo(). Review: Ulrich Weigand https://reviews.llvm.org/D67151 llvm-svn: 371048
* [X86] X86InstrInfo::optimizeCompareInstr - fix potential null dereference. Simon Pilgrim2019-09-051-2/+3
| | | | | | | | Fixes clang static-analyzer warning. Technically the MachineInstr *Sub might still be null if we're comparing zero (IsCmpZero == true), although this probably won't happen as SrcReg2 is probably == 0. llvm-svn: 371047
OpenPOWER on IntegriCloud