summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Fix bad indentation. NFCCraig Topper2019-09-061-1/+1
| | | | llvm-svn: 371167
* AMDGPU/GlobalISel: Avoid repeating 32-bit type listsMatt Arsenault2019-09-064-6/+14
| | | | llvm-svn: 371156
* AMDGPU/GlobalISel: Fix load/store of types in other address spacesMatt Arsenault2019-09-062-5/+26
| | | | | | There should probably be a size only matcher. llvm-svn: 371155
* AMDGPU: Allow getMemOperandWithOffset to analyze stack accessesMatt Arsenault2019-09-051-2/+19
| | | | | | | Report soffset as a base register if the scratch resource can be ignored. llvm-svn: 371149
* AMDGPU: Fix emitting multiple stack loads for stack passed workitemsMatt Arsenault2019-09-051-1/+15
| | | | | | | | | | The same stack is loaded for each workitem ID, and each use. Nothing prevents you from creating multiple fixed stack objects with the same offsets, so this was creating a load for each unique frame index, despite them being the same offset. Re-use the same frame index so the loads are CSEable. llvm-svn: 371148
* AMDGPU: Fix Register copypaste errorMatt Arsenault2019-09-051-2/+2
| | | | llvm-svn: 371141
* AMDGPU: Avoid constructing new std::vector in initCandidateMatt Arsenault2019-09-052-2/+5
| | | | | | | | | | | | Approximately 30% of the time was spent in the std::vector constructor. In one testcase this pushes the scheduler to being the second slowest pass. I'm not sure I understand why these vector are necessary. The default scheduler initCandidate seems to use some pre-existing vectors for the pressure. llvm-svn: 371136
* Recommit "[AArch64][GlobalISel] Teach AArch64CallLowering to handle basic ↵Jessica Paquette2019-09-052-7/+190
| | | | | | | | | | | | | | | | | | sibling calls" Recommit basic sibling call lowering (https://reviews.llvm.org/D67189) The issue was that if you have a return type other than void, call lowering will emit COPYs to get the return value after the call. Disallow sibling calls other than ones that return void for now. Also proactively disable swifterror tail calls for now, since there's a similar issue with COPYs there. Update call-translator-tail-call.ll to include test cases for each of these things. llvm-svn: 371114
* [X86] Enable BuildSDIVPow2 for i16.Craig Topper2019-09-051-2/+3
| | | | | | | We're able to use a 32-bit ADD and CMOV here and should work well with our other i16->i32 promotion optimizations. llvm-svn: 371107
* [X86] Override BuildSDIVPow2 for X86.Craig Topper2019-09-052-0/+58
| | | | | | | | | | | | | | | | | | | | | | | As noted in PR43197, we can use test+add+cmov+sra to implement signed division by a power of 2. This is based off the similar version in AArch64, but I've adjusted it to use target independent nodes where AArch64 uses target specific CMP and CSEL nodes. I've also blocked INT_MIN as the transform isn't valid for that. I've limited this to i32 and i64 on 64-bit targets for now and only when CMOV is supported. i8 and i16 need further investigation to be sure they get promoted to i32 well. I adjusted a few tests to enable cmov to demonstrate the new codegen. I also changed twoaddr-coalesce-3.ll to 32-bit mode without cmov to avoid perturbing the scenario that is being set up there. Differential Revision: https://reviews.llvm.org/D67087 llvm-svn: 371104
* [x86] fix horizontal math bug exposed by improved demanded elements analysis ↵Sanjay Patel2019-09-051-5/+24
| | | | | | | | (PR43225) https://bugs.llvm.org/show_bug.cgi?id=43225 llvm-svn: 371095
* [X86] Add a FIXME about why the CWD/CDQ/CQO have a bogus implicit def of the ↵Craig Topper2019-09-051-6/+5
| | | | | | | | | A register. NFC The instructions copy the sign bit of the A register to every bit of the D register. But they don't write to the A register. llvm-svn: 371094
* [X86] Fix stale comment. NFCCraig Topper2019-09-051-2/+2
| | | | | | | We aren't checking for a concat here. We're just always splitting 256-bit stores. llvm-svn: 371092
* [Hexagon] Fix type in HexagonTargetLowering::ReplaceNodeResultsKrzysztof Parzyszek2019-09-051-1/+2
| | | | llvm-svn: 371083
* [ARM] Add support for the s,j,x,N,O inline asm constraintsDavid Candler2019-09-051-3/+3
| | | | | | | | | | | | | | | | | | | | | | A number of inline assembly constraints are currently supported by LLVM, but rejected as invalid by Clang: Target independent constraints: s: An integer constant, but allowing only relocatable values ARM specific constraints: j: An immediate integer between 0 and 65535 (valid for MOVW) x: A 32, 64, or 128-bit floating-point/SIMD register: s0-s15, d0-d7, or q0-q3 N: An immediate integer between 0 and 31 (Thumb1 only) O: An immediate integer which is a multiple of 4 between -508 and 508. (Thumb1 only) This patch adds support to Clang for the missing constraints along with some checks to ensure that the constraints are used with the correct target and Thumb mode, and that immediates are within valid ranges (at least where possible). The constraints are already implemented in LLVM, but just a couple of minor corrections to checks (V8M Baseline includes MOVW so should work with 'j', 'N' and 'O' shouldn't be valid in Thumb2) so that Clang and LLVM are in line with each other and the documentation. Differential Revision: https://reviews.llvm.org/D65863 Change-Id: I18076619e319bac35fbb60f590c069145c9d9a0a llvm-svn: 371079
* [X86][SSE] EltsFromConsecutiveLoads - ignore non-zero offset base loads ↵Simon Pilgrim2019-09-051-0/+4
| | | | | | | | | | (PR43227) As discussed on D64551 and PR43227, we don't correctly handle cases where the base load has a non-zero byte offset. Until we can properly handle this, we must bail from EltsFromConsecutiveLoads. llvm-svn: 371078
* [ARM] Fixup the creation of VPT blocksDavid Green2019-09-051-15/+20
| | | | | | | | | | This attempts to just fix the creation of VPT blocks, fixing up the iterating, which instructions are considered in the bundle, and making sure that we do not overrun the end of the block. Differential Revision: https://reviews.llvm.org/D67219 llvm-svn: 371064
* [LLVM][Alignment] Convert isLegalNTStore/isLegalNTLoad to llvm::AlignGuillaume Chatelet2019-09-052-4/+4
| | | | | | | | | | | | | | | | | Summary: This is patch is part of a serie to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67223 llvm-svn: 371063
* [MIPS GlobalISel] Select G_FENCEPetar Avramovic2019-09-051-0/+4
| | | | | | | | | | | | G_FENCE comes form fence instruction. For MIPS fence is generated in AtomicExpandPass when atomic instruction gets surrounded with fence instruction when needed. G_FENCE arguments don't have LLT, because of that there is no job for legalizer and regbankselect. Instruction select G_FENCE for MIPS32. Differential Revision: https://reviews.llvm.org/D67181 llvm-svn: 371056
* [MIPS GlobalISel] Select llvm.trap intrinsicPetar Avramovic2019-09-051-1/+14
| | | | | | | | | Select G_INTRINSIC_W_SIDE_EFFECTS for Intrinsic::trap for MIPS32 via legalizeIntrinsic. Differential Revision: https://reviews.llvm.org/D67180 llvm-svn: 371055
* [MIPS GlobalISel] Lower SRet pointer argumentsPetar Avramovic2019-09-051-1/+3
| | | | | | | | | | | Instead of returning structure by value clang usually adds pointer to that structure as an argument. Pointers don't require special handling no matter the SRet flag. Remove unsuccessful exit from lowerCall for arguments with SRet flag if they are pointers. Differential Revision: https://reviews.llvm.org/D67179 llvm-svn: 371054
* Revert rL370996 from llvm/trunk: [AArch64][GlobalISel] Teach ↵Simon Pilgrim2019-09-052-173/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | AArch64CallLowering to handle basic sibling calls This adds support for basic sibling call lowering in AArch64. The intent here is to only handle tail calls which do not change the ABI (hence, sibling calls.) At this point, it is very restricted. It does not handle - Vararg calls. - Calls with outgoing arguments. - Calls whose calling conventions differ from the caller's calling convention. - Tail/sibling calls with BTI enabled. This patch adds - `AArch64CallLowering::isEligibleForTailCallOptimization`, which is equivalent to the same function in AArch64ISelLowering.cpp (albeit with the restrictions above.) - `mayTailCallThisCC` and `canGuaranteeTCO`, which are identical to those in AArch64ISelLowering.cpp. - `getCallOpcode`, which is exactly what it sounds like. Tail/sibling calls are lowered by checking if they pass target-independent tail call positioning checks, and checking if they satisfy `isEligibleForTailCallOptimization`. If they do, then a tail call instruction is emitted instead of a normal call. If we have a sibling call (which is always the case in this patch), then we do not emit any stack adjustment operations. When we go to lower a return, we check if we've already emitted a tail call. If so, then we skip the return lowering. For testing, this patch - Adds call-translator-tail-call.ll to test which tail calls we currently lower, which ones we don't, and which ones we shouldn't. - Updates branch-target-enforcement-indirect-calls.ll to show that we fall back as expected. Differential Revision: https://reviews.llvm.org/D67189 ........ This fails on EXPENSIVE_CHECKS builds due to a -verify-machineinstrs test failure in CodeGen/AArch64/dllimport.ll llvm-svn: 371051
* [X86] X86SpeculativeLoadHardeningPass::canHardenRegister - fix out of bounds ↵Simon Pilgrim2019-09-051-2/+5
| | | | | | | | warning. Fixes clang static-analyzer warning. llvm-svn: 371050
* [SystemZ] Recognize INLINEASM_BR in backendJonas Paulsson2019-09-054-18/+32
| | | | | | | | | | Handle the remaining cases also by handling asm goto in SystemZInstrInfo::getBranchInfo(). Review: Ulrich Weigand https://reviews.llvm.org/D67151 llvm-svn: 371048
* [X86] X86InstrInfo::optimizeCompareInstr - fix potential null dereference. Simon Pilgrim2019-09-051-2/+3
| | | | | | | | Fixes clang static-analyzer warning. Technically the MachineInstr *Sub might still be null if we're comparing zero (IsCmpZero == true), although this probably won't happen as SrcReg2 is probably == 0. llvm-svn: 371047
* [LLVM][Alignment] Make functions using log of alignment explicitGuillaume Chatelet2019-09-0537-144/+146
| | | | | | | | | | | | | | | | | | | | | Summary: This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align. The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment. A few renames uncovered dubious assignments: - `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation. - `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation, - `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation, Reviewers: lattner, thegameg, courbet Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet Tags: #llvm Differential Revision: https://reviews.llvm.org/D65945 llvm-svn: 371045
* AMDGPU: Add intrinsics for address space identificationMatt Arsenault2019-09-055-1/+50
| | | | | | | The library currently uses ptrtoint and directly checks the queue ptr for this, which counts as a pointer capture. llvm-svn: 371009
* AMDGPU/GlobalISel: Restore insert point when getting apertureMatt Arsenault2019-09-051-0/+6
| | | | | | Avoids SSA violations in a future patch. llvm-svn: 371008
* AMDGPU/GlobalISel: Fix placeholder value used for addrspacecastMatt Arsenault2019-09-051-4/+6
| | | | llvm-svn: 371007
* AMDGPU/GlobalISel: Fix assert on load from constant addressMatt Arsenault2019-09-051-4/+4
| | | | llvm-svn: 371006
* [AArch64][GlobalISel] Teach AArch64CallLowering to handle basic sibling callsJessica Paquette2019-09-042-7/+173
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds support for basic sibling call lowering in AArch64. The intent here is to only handle tail calls which do not change the ABI (hence, sibling calls.) At this point, it is very restricted. It does not handle - Vararg calls. - Calls with outgoing arguments. - Calls whose calling conventions differ from the caller's calling convention. - Tail/sibling calls with BTI enabled. This patch adds - `AArch64CallLowering::isEligibleForTailCallOptimization`, which is equivalent to the same function in AArch64ISelLowering.cpp (albeit with the restrictions above.) - `mayTailCallThisCC` and `canGuaranteeTCO`, which are identical to those in AArch64ISelLowering.cpp. - `getCallOpcode`, which is exactly what it sounds like. Tail/sibling calls are lowered by checking if they pass target-independent tail call positioning checks, and checking if they satisfy `isEligibleForTailCallOptimization`. If they do, then a tail call instruction is emitted instead of a normal call. If we have a sibling call (which is always the case in this patch), then we do not emit any stack adjustment operations. When we go to lower a return, we check if we've already emitted a tail call. If so, then we skip the return lowering. For testing, this patch - Adds call-translator-tail-call.ll to test which tail calls we currently lower, which ones we don't, and which ones we shouldn't. - Updates branch-target-enforcement-indirect-calls.ll to show that we fall back as expected. Differential Revision: https://reviews.llvm.org/D67189 llvm-svn: 370996
* AMDGPU/GlobalISel: Select G_BITREVERSEMatt Arsenault2019-09-042-1/+2
| | | | llvm-svn: 370980
* GlobalISel: Add basic legalization for G_BITREVERSEMatt Arsenault2019-09-041-1/+1
| | | | llvm-svn: 370979
* AMDGPU: Handle frame index expansion with no free SGPRs pre gfx9Matt Arsenault2019-09-042-26/+58
| | | | | | | | | | | | | | Since an add instruction must produce an unused carry out, this requires additional SGPRs. This can be avoided by keeping the entire offset computation in SGPRs. If one SGPR is still available, this only costs one extra mov. If none are available, the entire computation can be done in place and reversed. This does assume the use is a VGPR operand. This was already assumed, and we currently only select frame indexes to VALU instructions. This should probably be fixed at some point to handle more possible MIR. llvm-svn: 370929
* AMDGPU/GlobalISel: Make 16-bit constants legalMatt Arsenault2019-09-041-11/+5
| | | | | | This is mostly for the benefit of patterns which use 16-bit constants. llvm-svn: 370921
* [Hexagon] Improve generated code for test-if-bit-clear, one more timeKrzysztof Parzyszek2019-09-041-18/+29
| | | | | | Adjust isel patterns after recent commit. Fixes https://llvm.org/PR43194. llvm-svn: 370913
* [ARM][ParallelDSP] SExt mul for accumulationSam Parker2019-09-041-5/+14
| | | | | | | | | | For any unpaired muls, we accumulate them as an input to the reduction. Check the type of the mul and perform a sext if the existing accumlator input type is not the same. Differential Revision: https://reviews.llvm.org/D66993 llvm-svn: 370851
* [RISCV] Enable tail call opt for variadic functionJim Lin2019-09-041-5/+0
| | | | | | | | | | | | | | | | Summary: Tail call opt can treat variadic function call the same as normal function call Reviewers: mgrang, asb, lenary, lewis-revill Reviewed By: lenary Subscribers: luismarques, pzheng, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66278 llvm-svn: 370835
* Revert [Windows] Disable TrapUnreachable for Win64, add SEH_NoReturnReid Kleckner2019-09-037-37/+10
| | | | | | | | | | | | | | | | | | This reverts r370525 (git commit 0bb1630685fba255fa93def92603f064c2ffd203) Also reverts r370543 (git commit 185ddc08eed6542781040b8499ef7ad15c8ae9f4) The approach I took only works for functions marked `noreturn`. In general, a call that is not known to be noreturn may be followed by unreachable for other reasons. For example, there could be multiple call sites to a function that throws sometimes, and at some call sites, it is known to always throw, so it is followed by unreachable. We need to insert an `int3` in these cases to pacify the Windows unwinder. I think this probably deserves its own standalone, Win64-only fixup pass that runs after block placement. Implementing that will take some time, so let's revert to TrapUnreachable in the mean time. llvm-svn: 370829
* [WebAssembly] Compare functions by names in Emscripten SjljHeejin Ahn2019-09-031-64/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This removes all string constants for function names and compares functions by string directly when needed. Many of these constants are used only once or twice so the benefit of defining them separately is not very clear, and this actually fixes a bug. When we already have a `malloc` declaration which is an alias to something else within the module, ``` @malloc = weak hidden alias i8* (i32), i8* (i32)* @dlmalloc ``` (this happens compiling with emscripten with `-s WASM_OBJECT_FILES=0` because all bc files are merged before being fed into `wasm-ld` which runs the backend optimizations as LTO) `Module::getFunction("malloc")` in `canLongjmp` returns `nullptr` because `Module::getFunction` dyncasts pointer into `Function`, but the alias is a `GlobalValue` but not a `Function`. This makes `canLongjmp` return false for `malloc` in this case, and we end up adding a lot of longjmp handling code around malloc. This is not only a code size increase but actually a bug because `malloc` is used in the entry block when preparing for setjmp tables for emscripten sjlj handling, and this makes initial setjmp preparation, which has to happen in the entry block, move to another split block, and this interferes with SSA update later. This also adds two more functions, `getTempRet0` and `setTempRet0`, in the list of not longjmp-able functions. Fixes https://github.com/emscripten-core/emscripten/issues/8935. Reviewers: sbc100 Subscribers: mehdi_amini, jgravelle-google, hiraditya, sunfish, dexonsmith, dschuff, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67129 llvm-svn: 370828
* [AArch64][GlobalISel] Legalize 128 bit divisions to libcalls.Amara Emerson2019-09-031-0/+1
| | | | | | | | | Now that we have the infrastructure to support s128 types as parameters we can expand these to libcalls. Differential Revision: https://reviews.llvm.org/D66185 llvm-svn: 370823
* [GlobalISel][CallLowering] Add support for splitting types according to ↵Amara Emerson2019-09-035-18/+22
| | | | | | | | | | | | | | calling conventions. On AArch64, s128 types have to be split into s64 GPRs when passed as arguments. This change adds the generic support in call lowering for dealing with multiple registers, for incoming and outgoing args. Support for splitting for return types not yet implemented. Differential Revision: https://reviews.llvm.org/D66180 llvm-svn: 370822
* [MC] Pass through .code16/32/64 and .syntax unified for COFFReid Kleckner2019-09-031-10/+0
| | | | | | | | | | | | | | These flags should simply be passed through to the target, which will do the right thing. Add an MC/X86 test that uses these directives with the three primary object file formats and shows that they disassemble the same everywhere. There is a missing test for .code32 on Windows ARM, since I'm not sure exactly how to construct one. Fixes PR43203 llvm-svn: 370805
* [AArch64][GlobalISel] Don't import i64imm_32bit pattern at -O0Jessica Paquette2019-09-031-0/+11
| | | | | | | | | | | | | This pattern, when imported at -O0 adds an extra copy via the SUBREG_TO_REG. This is because the SUBREG_TO_REG is not eliminated. At all other opt levels, it is eliminated. This is a 1% geomean code size savings at -O0 on CTMark. Differential Revision: https://reviews.llvm.org/D67027 llvm-svn: 370789
* [SVE][Inline-Asm] Fix -Wimplicit-fallthrough in AArch64ISelLowering.cppKerry McLaughlin2019-09-031-0/+1
| | | | | | | | | | | | | | | | Summary: Adds break to 'x' case in getRegForInlineAsmConstraint added by D66302, fixing the unintentional fallthrough. Reviewers: sdesmalen, rovka, cameron.mcinally, greened, gribozavr, ruiu Reviewed By: sdesmalen Subscribers: bjope, javed.absar, tschuett, kristof.beyls, rkruppe, psnobl, llvm-commits, cfe-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67095 llvm-svn: 370769
* [X86] Merge 2 consecutive HasInt256 branches. NFCI.Simon Pilgrim2019-09-031-3/+2
| | | | llvm-svn: 370761
* [SystemZ] Recognize INLINEASM_BR in backend.Jonas Paulsson2019-09-031-2/+2
| | | | | | | | SystemZInstrInfo::analyzeBranch() needs to check for INLINEASM_BR instructions, or it will crash. Review: Ulrich Weigand llvm-svn: 370753
* [ARM] Ignore Implicit CPSR regs when lowering from Machine to MC operandsDavid Green2019-09-031-2/+2
| | | | | | | | | | | | | | The code here seems to date back to r134705, when tablegen lowering was first being added. I don't believe that we need to include CPSR implicit operands on the MCInst. This now works more like other backends (like AArch64), where all implicit registers are skipped. This allows the AliasInst for CSEL's to match correctly, as can be seen in the test changes. Differential revision: https://reviews.llvm.org/D66703 llvm-svn: 370745
* [SystemZ] Add support for fentry.Jonas Paulsson2019-09-032-0/+15
| | | | | | | SystemZAsmPrinter now properly emits function calls to __fentry__. Review: Ulrich Weigand llvm-svn: 370743
* [ARM] Invert CSEL predicates if the opposite is a simpler constant to ↵David Green2019-09-034-30/+75
| | | | | | | | | | | | | | | | | | | | materialise This moves ConstantMaterializationCost into ARMBaseInstrInfo so that it can also be used in ISel Lowering, adding codesize values to the computed costs, to be able to compare either approximate instruction counts or codesize costs. It also adds a HasLowerConstantMaterializationCost, which compares the ConstantMaterializationCost of two values, returning true if the first is smaller either in instruction count/codesize, or falling back to the other in the case that they are equal. This is used in constant CSEL lowering to invert the predicate if the opposite is easier to materialise. Differential revision: https://reviews.llvm.org/D66701 llvm-svn: 370741
OpenPOWER on IntegriCloud