summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU/GlobalISel: Fix assert on load from constant addressMatt Arsenault2019-09-051-4/+4
| | | | llvm-svn: 371006
* [AArch64][GlobalISel] Teach AArch64CallLowering to handle basic sibling callsJessica Paquette2019-09-042-7/+173
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds support for basic sibling call lowering in AArch64. The intent here is to only handle tail calls which do not change the ABI (hence, sibling calls.) At this point, it is very restricted. It does not handle - Vararg calls. - Calls with outgoing arguments. - Calls whose calling conventions differ from the caller's calling convention. - Tail/sibling calls with BTI enabled. This patch adds - `AArch64CallLowering::isEligibleForTailCallOptimization`, which is equivalent to the same function in AArch64ISelLowering.cpp (albeit with the restrictions above.) - `mayTailCallThisCC` and `canGuaranteeTCO`, which are identical to those in AArch64ISelLowering.cpp. - `getCallOpcode`, which is exactly what it sounds like. Tail/sibling calls are lowered by checking if they pass target-independent tail call positioning checks, and checking if they satisfy `isEligibleForTailCallOptimization`. If they do, then a tail call instruction is emitted instead of a normal call. If we have a sibling call (which is always the case in this patch), then we do not emit any stack adjustment operations. When we go to lower a return, we check if we've already emitted a tail call. If so, then we skip the return lowering. For testing, this patch - Adds call-translator-tail-call.ll to test which tail calls we currently lower, which ones we don't, and which ones we shouldn't. - Updates branch-target-enforcement-indirect-calls.ll to show that we fall back as expected. Differential Revision: https://reviews.llvm.org/D67189 llvm-svn: 370996
* AMDGPU/GlobalISel: Select G_BITREVERSEMatt Arsenault2019-09-042-1/+2
| | | | llvm-svn: 370980
* GlobalISel: Add basic legalization for G_BITREVERSEMatt Arsenault2019-09-041-1/+1
| | | | llvm-svn: 370979
* AMDGPU: Handle frame index expansion with no free SGPRs pre gfx9Matt Arsenault2019-09-042-26/+58
| | | | | | | | | | | | | | Since an add instruction must produce an unused carry out, this requires additional SGPRs. This can be avoided by keeping the entire offset computation in SGPRs. If one SGPR is still available, this only costs one extra mov. If none are available, the entire computation can be done in place and reversed. This does assume the use is a VGPR operand. This was already assumed, and we currently only select frame indexes to VALU instructions. This should probably be fixed at some point to handle more possible MIR. llvm-svn: 370929
* AMDGPU/GlobalISel: Make 16-bit constants legalMatt Arsenault2019-09-041-11/+5
| | | | | | This is mostly for the benefit of patterns which use 16-bit constants. llvm-svn: 370921
* [Hexagon] Improve generated code for test-if-bit-clear, one more timeKrzysztof Parzyszek2019-09-041-18/+29
| | | | | | Adjust isel patterns after recent commit. Fixes https://llvm.org/PR43194. llvm-svn: 370913
* [ARM][ParallelDSP] SExt mul for accumulationSam Parker2019-09-041-5/+14
| | | | | | | | | | For any unpaired muls, we accumulate them as an input to the reduction. Check the type of the mul and perform a sext if the existing accumlator input type is not the same. Differential Revision: https://reviews.llvm.org/D66993 llvm-svn: 370851
* [RISCV] Enable tail call opt for variadic functionJim Lin2019-09-041-5/+0
| | | | | | | | | | | | | | | | Summary: Tail call opt can treat variadic function call the same as normal function call Reviewers: mgrang, asb, lenary, lewis-revill Reviewed By: lenary Subscribers: luismarques, pzheng, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66278 llvm-svn: 370835
* Revert [Windows] Disable TrapUnreachable for Win64, add SEH_NoReturnReid Kleckner2019-09-037-37/+10
| | | | | | | | | | | | | | | | | | This reverts r370525 (git commit 0bb1630685fba255fa93def92603f064c2ffd203) Also reverts r370543 (git commit 185ddc08eed6542781040b8499ef7ad15c8ae9f4) The approach I took only works for functions marked `noreturn`. In general, a call that is not known to be noreturn may be followed by unreachable for other reasons. For example, there could be multiple call sites to a function that throws sometimes, and at some call sites, it is known to always throw, so it is followed by unreachable. We need to insert an `int3` in these cases to pacify the Windows unwinder. I think this probably deserves its own standalone, Win64-only fixup pass that runs after block placement. Implementing that will take some time, so let's revert to TrapUnreachable in the mean time. llvm-svn: 370829
* [WebAssembly] Compare functions by names in Emscripten SjljHeejin Ahn2019-09-031-64/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This removes all string constants for function names and compares functions by string directly when needed. Many of these constants are used only once or twice so the benefit of defining them separately is not very clear, and this actually fixes a bug. When we already have a `malloc` declaration which is an alias to something else within the module, ``` @malloc = weak hidden alias i8* (i32), i8* (i32)* @dlmalloc ``` (this happens compiling with emscripten with `-s WASM_OBJECT_FILES=0` because all bc files are merged before being fed into `wasm-ld` which runs the backend optimizations as LTO) `Module::getFunction("malloc")` in `canLongjmp` returns `nullptr` because `Module::getFunction` dyncasts pointer into `Function`, but the alias is a `GlobalValue` but not a `Function`. This makes `canLongjmp` return false for `malloc` in this case, and we end up adding a lot of longjmp handling code around malloc. This is not only a code size increase but actually a bug because `malloc` is used in the entry block when preparing for setjmp tables for emscripten sjlj handling, and this makes initial setjmp preparation, which has to happen in the entry block, move to another split block, and this interferes with SSA update later. This also adds two more functions, `getTempRet0` and `setTempRet0`, in the list of not longjmp-able functions. Fixes https://github.com/emscripten-core/emscripten/issues/8935. Reviewers: sbc100 Subscribers: mehdi_amini, jgravelle-google, hiraditya, sunfish, dexonsmith, dschuff, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67129 llvm-svn: 370828
* [AArch64][GlobalISel] Legalize 128 bit divisions to libcalls.Amara Emerson2019-09-031-0/+1
| | | | | | | | | Now that we have the infrastructure to support s128 types as parameters we can expand these to libcalls. Differential Revision: https://reviews.llvm.org/D66185 llvm-svn: 370823
* [GlobalISel][CallLowering] Add support for splitting types according to ↵Amara Emerson2019-09-035-18/+22
| | | | | | | | | | | | | | calling conventions. On AArch64, s128 types have to be split into s64 GPRs when passed as arguments. This change adds the generic support in call lowering for dealing with multiple registers, for incoming and outgoing args. Support for splitting for return types not yet implemented. Differential Revision: https://reviews.llvm.org/D66180 llvm-svn: 370822
* [MC] Pass through .code16/32/64 and .syntax unified for COFFReid Kleckner2019-09-031-10/+0
| | | | | | | | | | | | | | These flags should simply be passed through to the target, which will do the right thing. Add an MC/X86 test that uses these directives with the three primary object file formats and shows that they disassemble the same everywhere. There is a missing test for .code32 on Windows ARM, since I'm not sure exactly how to construct one. Fixes PR43203 llvm-svn: 370805
* [AArch64][GlobalISel] Don't import i64imm_32bit pattern at -O0Jessica Paquette2019-09-031-0/+11
| | | | | | | | | | | | | This pattern, when imported at -O0 adds an extra copy via the SUBREG_TO_REG. This is because the SUBREG_TO_REG is not eliminated. At all other opt levels, it is eliminated. This is a 1% geomean code size savings at -O0 on CTMark. Differential Revision: https://reviews.llvm.org/D67027 llvm-svn: 370789
* [SVE][Inline-Asm] Fix -Wimplicit-fallthrough in AArch64ISelLowering.cppKerry McLaughlin2019-09-031-0/+1
| | | | | | | | | | | | | | | | Summary: Adds break to 'x' case in getRegForInlineAsmConstraint added by D66302, fixing the unintentional fallthrough. Reviewers: sdesmalen, rovka, cameron.mcinally, greened, gribozavr, ruiu Reviewed By: sdesmalen Subscribers: bjope, javed.absar, tschuett, kristof.beyls, rkruppe, psnobl, llvm-commits, cfe-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67095 llvm-svn: 370769
* [X86] Merge 2 consecutive HasInt256 branches. NFCI.Simon Pilgrim2019-09-031-3/+2
| | | | llvm-svn: 370761
* [SystemZ] Recognize INLINEASM_BR in backend.Jonas Paulsson2019-09-031-2/+2
| | | | | | | | SystemZInstrInfo::analyzeBranch() needs to check for INLINEASM_BR instructions, or it will crash. Review: Ulrich Weigand llvm-svn: 370753
* [ARM] Ignore Implicit CPSR regs when lowering from Machine to MC operandsDavid Green2019-09-031-2/+2
| | | | | | | | | | | | | | The code here seems to date back to r134705, when tablegen lowering was first being added. I don't believe that we need to include CPSR implicit operands on the MCInst. This now works more like other backends (like AArch64), where all implicit registers are skipped. This allows the AliasInst for CSEL's to match correctly, as can be seen in the test changes. Differential revision: https://reviews.llvm.org/D66703 llvm-svn: 370745
* [SystemZ] Add support for fentry.Jonas Paulsson2019-09-032-0/+15
| | | | | | | SystemZAsmPrinter now properly emits function calls to __fentry__. Review: Ulrich Weigand llvm-svn: 370743
* [ARM] Invert CSEL predicates if the opposite is a simpler constant to ↵David Green2019-09-034-30/+75
| | | | | | | | | | | | | | | | | | | | materialise This moves ConstantMaterializationCost into ARMBaseInstrInfo so that it can also be used in ISel Lowering, adding codesize values to the computed costs, to be able to compare either approximate instruction counts or codesize costs. It also adds a HasLowerConstantMaterializationCost, which compares the ConstantMaterializationCost of two values, returning true if the first is smaller either in instruction count/codesize, or falling back to the other in the case that they are equal. This is used in constant CSEL lowering to invert the predicate if the opposite is easier to materialise. Differential revision: https://reviews.llvm.org/D66701 llvm-svn: 370741
* [ARM] Generate 8.1-m CSINC, CSNEG and CSINV instructions.David Green2019-09-036-1/+92
| | | | | | | | | | | | Arm 8.1-M adds a number of related CSEL instructions, including CSINC, CSNEG and CSINV. These choose between two values given the content in CPSR and a condition, performing an increment, negation or inverse of the false value. This adds some selection for them, either from constant values or patterns. It does not include CSEL directly, which is currently not always making code better. It is still useful, but we will have to check more carefully where it should and shouldn't be used. Code by Ranjeet Singh and Simon Tatham, with some modifications from me. Differential revision: https://reviews.llvm.org/D66483 llvm-svn: 370739
* [mips] Switch to the `.text` section after emitting asm file preambleSimon Atanasyan2019-09-031-0/+4
| | | | | | | | | | | | | | | | | | | | Now the last `.section` directive in the MIPS asm file preamble is the `.section .mdebug.abi`. If assembler code injected for example by the LLVM `module asm` or the C ` __asm` directives do not contain explicit switching to the `.text` section it goes to the `.mdebug.abi` section. It might be unexpected to the user and in fact for example breaks building some existing code like FreeBSD libc [1]. The patch forces switching to the `.text` section after emitting MIPS assembler file preamble. [1] https://bugs.llvm.org/show_bug.cgi?id=43119 Fix PR43119. Differential Revision: https://reviews.llvm.org/D67014 llvm-svn: 370735
* [ARM] Fix MVE ldst offset rangesDavid Green2019-09-031-19/+18
| | | | | | | | | | | | | | | We were using isShiftedInt<7, Shift>(RHSC) to detect the ranges of offsets to fold into MVE loads/stores. The instructions actually take a 7 bit unsigned integer which is either added or subtracted. So something more like isShiftedUInt<7, Shift>(abs(RHSC)). Instead I've changes this to use the isScaledConstantInRange method, same as in SelectT2AddrModeImm7Offset used by pre/post inc, which seemed to already be getting this correct. Differential revision: https://reviews.llvm.org/D66997 llvm-svn: 370731
* [ARM][MVE] Decoding of VMSR doesn't diagnose some unpredictable encodingsOliver Stannard2019-09-031-25/+29
| | | | | | | | | | | | | | | | Decoding of VMSR doesn't diagnose some unpredictable encodings, as the unpredictable bits are not correctly set. Diff-reduce this instruction's internals WRT VMRS so I can see the differences better. Mostly this is s/src/Rt/g. Fill in the "should-be-(0)" bits. Designate the Unpredictable{} bits for both VMRS and VMSR. Patch by Mark Murray! Differential revision: https://reviews.llvm.org/D66938 llvm-svn: 370729
* Bug fix on function epilog optimization (ARM backend)Oliver Stannard2019-09-031-2/+3
| | | | | | | | | | | | | | | To save a 'add sp,#val' instruction by adding registers to the final pop instruction, the first register transferred by this pop instruction need to be found. If the function to be optimized has a non-void return value, the operand list contains r0 (implicit) which prevents the optimization to take place. Therefore implicit register references should be skipped in the search loop, because this registers are never popped from the stack. Patch by Rainer Herbertz (rOptimizer)! Differential revision: https://reviews.llvm.org/D66730 llvm-svn: 370728
* [ARM] Select vmlaSam Tebbs2019-09-031-0/+15
| | | | | | | | This patch adds vmla selection. Differential revision: https://reviews.llvm.org/D66297 llvm-svn: 370704
* [X86] Simplify the setOperationAction handling for fp_to_uint by improving ↵Craig Topper2019-09-032-19/+22
| | | | | | | | | | | | | | | | the Custom handler a bit. This merges the 32-bit and 64-bit mode code to just use Custom for both i32 and i64. We already had most of the handling in the custom handling due to the AVX512 having legal fp_to_uint. Just needed to add the i32->i64 promotion handling. Refactor the fp_to_uint code in the custom handler to simplify the number of times we check things. Tweak cost model tables to match the default handling we were getting due to Expand before. llvm-svn: 370700
* [X86] Don't use Expand for i32 fp_to_uint on SSE1/2 targets on 32-bit target.Craig Topper2019-09-031-13/+7
| | | | | | | | Use Custom lowering instead. Fall back to default expansion only when the scalar FP type belongs in an XMM register. This improves lowering for i32 to fp80, and also i32 to double on SSE1 only. llvm-svn: 370699
* [X86] Custom promote i32->f80 uint_to_fp on AVX512 64-bit targets.Craig Topper2019-09-031-8/+7
| | | | | | | Reuse the same code to promote all i32 uint_to_fp on 64-bit targets to simplify the X86ISelLowering constructor. llvm-svn: 370693
* [X86] Enable fp128 as a legal type with SSE1 rather than with MMX.Craig Topper2019-09-021-2/+2
| | | | | | | | | | | | | | | | FP128 values are passed in xmm registers so should be asssociated with an SSE feature rather than MMX which uses a different set of registers. llc enables sse1 and sse2 by default with x86_64. But does not enable mmx. Clang enables all 3 features by default. I've tried to add command lines to test with -sse where possible, but any test that returns a value in an xmm register fails with a fatal error with -sse since we have no defined ABI for that scenario. llvm-svn: 370682
* [ARM] Use MQPR not QPR for MVE registersDavid Green2019-09-023-96/+98
| | | | | | | | | We should be using MQPR, and if we don't we can get COPYs and PHIs created for QPR. These get folded into instructions, failing verification checks. Differential revision: https://reviews.llvm.org/D66214 llvm-svn: 370676
* [SystemZ] Support constrained fpto[su]i intrinsicsUlrich Weigand2019-09-023-16/+32
| | | | | | | | | | | Now that constrained fpto[su]i intrinsic are available, add codegen support to the SystemZ backend. In addition to pure back-end changes, I've also needed to add the strict_fp_to_[su]int and any_fp_to_[su]int pattern fragments in the obvious way. llvm-svn: 370674
* [SVE][Inline-Asm] Support for SVE asm operandsKerry McLaughlin2019-09-024-7/+89
| | | | | | | | | | | | | | | | | | | | | | | | Summary: Adds the following inline asm constraints for SVE: - w: SVE vector register with full range, Z0 to Z31 - x: Restricted to registers Z0 to Z15 inclusive. - y: Restricted to registers Z0 to Z7 inclusive. This change also adds the "z" modifier to interpret a register as an SVE register. Not all of the bitconvert patterns added by this patch are used, but they have been included here for completeness. Reviewers: t.p.northover, sdesmalen, rovka, momchil.velikov, rengolin, cameron.mcinally, greened Reviewed By: sdesmalen Subscribers: javed.absar, tschuett, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66302 llvm-svn: 370673
* [X86] getPMOVMSKB - add MVT::v64i8 handling and remove from ↵Simon Pilgrim2019-09-021-11/+12
| | | | | | combineBitcastvxi1. NFCI. llvm-svn: 370670
* Partially revert D61491 "AMDGPU: Be explicit about whether the high-word in ↵Jay Foad2019-09-022-5/+2
| | | | | | | | | | | | | | | | | | | SI_PC_ADD_REL_OFFSET is 0" Summary: D61491 caused us to use relocs when they're not strictly necessary, to refer to symbols in the text section. This is a pessimization and it's a problem for some loaders that don't support relocs yet. Reviewers: nhaehnle, arsenm, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65813 llvm-svn: 370667
* [AMDGPU][MC][GFX10] Corrected constant bus checks to exclude nullDmitry Preobrazhensky2019-09-021-3/+6
| | | | | | | | | | See AMD SWDEV-157286 Reviewers: atamazov, arsenm Differential Revision: https://reviews.llvm.org/D65229 llvm-svn: 370665
* [AMDGPU][MC][GFX10] Enabled null with 64-bit operandsDmitry Preobrazhensky2019-09-021-0/+2
| | | | | | | | | | See Bug 42745: https://bugs.llvm.org/show_bug.cgi?id=42745 Reviewers: atamazov, arsenm https://reviews.llvm.org/D65231 llvm-svn: 370660
* [AMDGPU][MC][GFX10] Corrected constant bus limit for 64-bit shift instructionsDmitry Preobrazhensky2019-09-021-4/+23
| | | | | | | | | | See bug 42744: https://bugs.llvm.org/show_bug.cgi?id=42744 Reviewers: atamazov, arsenm Differential Revision: https://reviews.llvm.org/D65228 llvm-svn: 370652
* [X86][BtVer2] Fix latency and throughput of conditional SIMD store instructions.Andrea Di Biagio2019-09-0212-29/+100
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On BtVer2 conditional SIMD stores are heavily microcoded. The latency is directly proportional to the number of packed elements extracted from the input vector. Also, according to micro-benchmarks, most of the computation seems to be done in the integer unit. Only a minority of the uOPs is executed by the FPU. The observed behaviour on the FPU looks similar to this: - The input MASK value is moved to the Integer Unit -- [ a VMOVMSK-like uOP-executed on JFPU0]. - In parallel, each element of the input XMM/YMM is extracted and then sent to the IntegerUnit through JFPU1. As expected, a (conditional) store is executed for every extracted element. Interestingly, a (speculative) load is executed for every extracted element too. It is as-if a "LOAD - BIT_EXTRACT- CMOV" sequence of uOPs is repeated by the integer unit for every contionally stored element. VMASKMOVDQU is a special case: the number of speculative loads is always 2 (presumably, one load per quadword). That means, extra shifts and masking is performed on (one of) the loaded quadwords before each conditional store (that also explains the big number of non-FP uOPs retired). This patch replaces the existing writes for conditional SIMD stores (i.e. WriteFMaskedStore, and WriteFMaskedStoreY) with the following new writes: WriteFMaskedStore32 [ XMM Packed Single ] WriteFMaskedStore32Y [ YMM Packed Single ] WriteFMaskedStore64 [ XMM Packed Double ] WriteFMaskedStore64Y [ YMM Packed Double ] Added a wrapper class named X86SchedWriteMaskMove in X86Schedule.td to describe both RM and MR variants for conditional SIMD moves in a single tablegen definition. Instances of that class are then passed in input to multiclass avx_movmask_rm when constructing MASKMOVPS/PD definitions. Since this patch introduces new writes, I had to update all the X86 scheduling models. Differential Revision: https://reviews.llvm.org/D66801 llvm-svn: 370649
* [X86] combineHorizontalPredicateResult - pull out repeated ↵Simon Pilgrim2019-09-021-2/+2
| | | | | | getTargetLoweringInfo() calls. NFCI. llvm-svn: 370637
* [X86] Add initial support for unfolding broadcast loads from arithmetic ↵Craig Topper2019-09-013-10/+155
| | | | | | | | | | instructions to enable LICM hoisting of the load MachineLICM can hoist an invariant load, but if that load is folded it needs to be unfolded. On AVX512 sometimes this load is an broadcast load which we were previously unable to unfold. This patch adds initial support for that with a very basic list of supported instructions as a starting point. Differential Revision: https://reviews.llvm.org/D67017 llvm-svn: 370620
* [X86][AVX] Rename + cleanup lowerShuffleAsLanePermuteAndBlend. NFCI.Simon Pilgrim2019-09-011-28/+30
| | | | | | | | | | Rename to lowerShuffleAsLanePermuteAndShuffle to make it clear that not just blends are performed. Cleanup the in-lane shuffle mask generation to make it more obvious what's going on. Some prep work noticed while investigating the poor shuffle code mentioned in D66004. llvm-svn: 370613
* Fix shadow variable warning. NFCI.Simon Pilgrim2019-09-011-3/+3
| | | | llvm-svn: 370610
* [ARM] Remove MVE masked loads/storesDavid Green2019-09-013-127/+0
| | | | | | | | | These were never enabled correctly and are causing other problems. Taking them out for the moment, whilst we work on the issues. This reverts r370329. llvm-svn: 370607
* AMDGPU: Remove unused custom node definitionMatt Arsenault2019-09-013-12/+0
| | | | llvm-svn: 370603
* [X86] Replace some COPY_TO_REGCLASS from GR32/GR64 to VR128 in isel patterns ↵Craig Topper2019-08-311-22/+18
| | | | | | | | | | with VMOVDI2PDIrr/VMOV64toPQIrr. This is what the copies will eventually be turned into. We don't use COPY_TO_REGCLASS for scalar_to_vector patterns. So we should use the real instruction here too. llvm-svn: 370601
* [X86] Compress the flag bits in the folding tables to make room for more ↵Craig Topper2019-08-312-13/+18
| | | | | | bits in an upcoming patch. llvm-svn: 370600
* [NFC] Fixed -Wdocumentation warningDavid Bolvansky2019-08-311-8/+8
| | | | | | | /srv/llvm-buildbot-srcatch/llvm-build-dir/clang-x86_64-debian-fast/llvm.src/lib/Target/AMDGPU/AMDGPUGenRegisterBankInfo.def:98:1: warning: not a Doxygen trailing comment [-Wdocumentation] 1 warning generated. llvm-svn: 370596
* [X86] EltsFromConsecutiveLoads - Don't confuse elt count with vector element ↵Simon Pilgrim2019-08-311-11/+16
| | | | | | | | count (PR43170) EltsFromConsecutiveLoads was assuming that the number of input elts was the same as the number of elements in the output vector type when creating a zeroing shuffle, causing an assert when subvectors were being combined instead of just scalars. llvm-svn: 370592
OpenPOWER on IntegriCloud