summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* CMake: Make most target symbols hidden by defaultTom Stellard2020-01-14105-105/+114
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF this change makes all symbols in the target specific libraries hidden by default. A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these libraries public, which is mainly needed for the definitions of the LLVMInitialize* functions. This patch reduces the number of public symbols in libLLVM.so by about 25%. This should improve load times for the dynamic library and also make abi checker tools, like abidiff require less memory when analyzing libLLVM.so One side-effect of this change is that for builds with LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that access symbols that are no longer public will need to be statically linked. Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1): nm before/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l 36221 nm after/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l 26278 Reviewers: chandlerc, beanz, mgorny, rnk, hans Reviewed By: rnk, hans Subscribers: merge_guards_bot, luismarques, smeenai, ldionne, lenary, s.egerton, pzheng, sameer.abuasal, MaskRay, wuzish, echristo, Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D54439
* [BranchAlign] Add master --x86-branches-within-32B-boundaries flagPhilip Reames2020-01-141-2/+23
| | | | | | | | | | | | This flag was originally part of D70157, but was removed as we carved away pieces of the review. Since we have the nop support checked in, and it appears mature(*), I think it's time to add the master flag. For now, it will default to nop padding, but once the prefix padding support lands, we'll update the defaults. (*) I can now confirm that downstream testing of the changes which have landed to date - nop padding and compiler support for suppressions - is passing all of the functional testing we've thrown at it. There might still be something lurking, but we've gotten enough coverage to be confident of the basic approach. Note that the new flag can be used either when assembling an .s file, or when using the integrated assembler directly from the compiler. The later will use all of the suppression mechanism and should always generate correct code. We don't yet have assembly syntax for the suppressions, so passing this directly to the assembler w/a raw .s file may result in broken code. Use at your own risk. Also note that this isn't the wiring for the clang option. I think the most recent review for that is D72227, but I've lost track, so that might be off. Differential Revision: https://reviews.llvm.org/D72738
* [Win64] Handle FP arguments more gracefully under -mno-sseReid Kleckner2020-01-142-22/+31
| | | | | | | | | | | | | | | | | Pass small FP values in GPRs or stack memory according the the normal convention. This is what gcc -mno-sse does on Win64. I adjusted the conditions under which we emit an error to check if the argument or return value would be passed in an XMM register when SSE is disabled. This has a side effect of no longer emitting an error for FP arguments marked 'inreg' when targetting x86 with SSE disabled. Our calling convention logic was already assigning it to FP0/FP1, and then we emitted this error. That seems unnecessary, we can ignore 'inreg' and compile it without SSE. Reviewers: jyknight, aemerson Differential Revision: https://reviews.llvm.org/D70465
* [X86] Drop an unneeded FIXME. NFCCraig Topper2020-01-141-1/+0
| | | | The extload on X87 is free.
* [X86] Swap the 0 and the fudge factor in the constant pool for the 32-bit ↵Craig Topper2020-01-141-4/+4
| | | | | | | | | | | | mode i64->f32/f64/f80 uint_to_fp algorithm. This allows us to generate better code for selecting the fixup to load. Previously when the sign was set we had to load offset 0. And when it was clear we had to load offset 4. This required a testl, setns, zero extend, and finally a mul by 4. By switching the offsets we can just shift the sign bit into the lsb and multiply it by 4.
* [codegen,amdgpu] Enhance MIR DIE and re-arrange it for AMDGPU.Michael Liao2020-01-141-1/+1
| | | | | | | | | | | | | | | | | | | Summary: - `dead-mi-elimination` assumes MIR in the SSA form and cannot be arranged after phi elimination or DeSSA. It's enhanced to handle the dead register definition by skipping use check on it. Once a register def is `dead`, all its uses, if any, should be `undef`. - Re-arrange the DIE in RA phase for AMDGPU by placing it directly after `detect-dead-lanes`. - Many relevant tests are refined due to different register assignment. Reviewers: rampitec, qcolombet, sunfish Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72709
* [AArch64][GlobalISel]: Support @llvm.{return,frame}address selection.Amara Emerson2020-01-141-1/+38
| | | | | | | | | These intrinsics expand to a variable number of instructions so just like in ISelLowering.cpp we use custom code to deal with them. Committing Tim's original patch. Differential Revision: https://reviews.llvm.org/D65656
* [SVE] Add patterns for MUL immediate instruction.Danilo Carvalho Grael2020-01-142-2/+7
| | | | | | | | | | | | Summary: Add the missing MUL pattern for integer immediate instructions. Reviewers: sdesmalen, huntergr, efriedma, c-rhodes, kmclaughlin Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits, amehsan Tags: #llvm Differential Revision: https://reviews.llvm.org/D72654
* [RISCV] Allow shrink wrapping for RISC-Vlewis-revill2020-01-141-4/+16
| | | | | | | | | Enabling shrink wrapping requires ensuring the insertion point of the epilogue is correct for MBBs without a terminator, in which case the instruction to adjust the stack pointer is the last instruction in the block. Differential Revision: https://reviews.llvm.org/D62190
* [X86] Directly emit a BROADCAST_LOAD from constant pool in ↵Craig Topper2020-01-141-2/+12
| | | | | | | | lowerUINT_TO_FP_vXi32 to avoid double loads seen in D71971 By directly emitting the constants as a constant pool load we seem to avoid the build_vector/extract_subvector combines that resulted in the duplicate loads we had before. Differential Revision: https://reviews.llvm.org/D72307
* [AIX][XCOFF] Supporting the ReadOnlyWithRel SectionKnddiggerlin2020-01-141-2/+1
| | | | | | | | | | SUMMARY: In this patch we put the global variable in a Csect which's SectionKind is "ReadOnlyWithRel" into Data Section. Reviewers: hubert.reinterpretcast,jasonliu,Xiangling_L Subscribers: wuzish, nemanjai, hiraditya Differential Revision: https://reviews.llvm.org/D72461
* [ARM][MVE] VTP Block Pass fixSjoerd Meijer2020-01-141-2/+2
| | | | | | | | | | Fix a missing and broken test: 2 VPT blocks predicated on the same VCMP instruction that can be folded. The problem was that for each VPT block, we record the predicate statements with a list, but the same instruction was added twice. Thus, we were running in an assert trying to remove the same instruction twice. To avoid this the instructions are now recorded with a set. Differential Revision: https://reviews.llvm.org/D72699
* [AArch64] Fix save register pairing for Windows AAPCSSanne Wouda2020-01-141-4/+16
| | | | | | | | | | | | | | | | | | | | | | Summary: On Windows, when a function does not have an unwind table (for example, EH filtering funclets), we don't correctly pair FP and LR to form the frame record in all circumstances. Fix this by invalidating a pair when the second register is FP when compiling for Windows, even when CFI is not needed. Fixes PR44271 introduced by D65653. Reviewers: efriedma, sdesmalen, rovka, rengolin, t.p.northover, thegameg, greened Reviewed By: rengolin Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71754
* [AIX] ExternalSymbolSDNode loweringXiangling Liao2020-01-141-24/+64
| | | | | | | | For memcpy/memset/memmove etc., replace ExternalSymbolSDNode with a MCSymbolSDNode, which have a prefix dot before function name as entry point symbol. Differential Revision: https://reviews.llvm.org/D70718
* Make helper functions static or move them into anonymous namespaces. NFC.Benjamin Kramer2020-01-142-4/+4
|
* [ARM,MVE] Use the new Tablegen `defvar` and `if` statements.Simon Tatham2020-01-141-253/+232
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This cleans up a lot of ugly `foreach` bodges that I've been using to work around the lack of those two language features. Now they both exist, I can make then all into something more legible! In particular, in the common pattern in `ARMInstrMVE.td` where a multiclass defines an `Instruction` instance plus one or more `Pat` that select it, I've used a `defvar` to wrap `!cast<Instruction>(NAME)` so that the patterns themselves become a little more legible. Replacing a `foreach` with a `defvar` removes a level of block structure, so several pieces of code have their indentation changed by this patch. Best viewed with whitespace ignored. NFC: the output of `llvm-tblgen -print-records` on the two affected Tablegen sources is exactly identical before and after this change, so there should be no effect at all on any of the other generated files. Reviewers: MarkMurrayARM, miyuki Reviewed By: MarkMurrayARM Subscribers: kristof.beyls, hiraditya, dmgreen, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72690
* [ARM][LowOverheadLoops] Allow all MVE instrs.Sam Parker2020-01-141-21/+18
| | | | | | | | | | | | | | We have a whitelist of instructions that we allow when tail predicating, since these are trivial ones that we've deemed need no special handling. Now change ARMLowOverheadLoops to allow the non-trivial instructions if they're contained within a valid VPT block. Since a valid block is one that is predicated upon the VCTP so we know that these non-trivial instructions will still behave as expected once the implicit predication is used instead. This also fixes a previous test failure. Differential Revision: https://reviews.llvm.org/D72509
* [ARM][LowOverheadLoops] Change predicate inspectionSam Parker2020-01-141-26/+27
| | | | | | | | | | Use the already provided helper function to get the operand type so that we can detect whether the vpr is being used as a predicate or not. Also use existing helpers to get the predicate indices when we converting the vpt blocks. This enables us to support both types of vpr predicate operand. Differential Revision: https://reviews.llvm.org/D72504
* [ARM][Thumb2] Fix ADD/SUB invalid writes to SPDiogo Sampaio2020-01-147-86/+351
| | | | | | | | | | | | | | | | | | | | Summary: This patch fixes pr23772 [ARM] r226200 can emit illegal thumb2 instruction: "sub sp, r12, #80". The violation was that SUB and ADD (reg, immediate) instructions can only write to SP if the source register is also SP. So the above instructions was unpredictable. To enforce that the instruction t2(ADD|SUB)ri does not write to SP we now enforce the destination register to be rGPR (That exclude PC and SP). Different than the ARM specification, that defines one instruction that can read from SP, and one that can't, here we inserted one that can't write to SP, and other that can only write to SP as to reuse most of the hard-coded size optimizations. When performing this change, it uncovered that emitting Thumb2 Reg plus Immediate could not emit all variants of ADD SP, SP #imm instructions before so it was refactored to be able to. (see test/CodeGen/Thumb2/mve-stacksplot.mir where we use a subw sp, sp, Imm12 variant ) It also uncovered a disassembly issue of adr.w instructions, that were only written as SUBW instructions (see llvm/test/MC/Disassembler/ARM/thumb2.txt). Reviewers: eli.friedman, dmgreen, carwil, olista01, efriedma, andreadb Reviewed By: efriedma Subscribers: gbedwell, john.brawn, efriedma, ostannard, kristof.beyls, hiraditya, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70680
* [ARM][MVE] Disallow VPSEL for tail predicationSam Parker2020-01-142-4/+16
| | | | | | | | | | | | | | | | | | | Due to the current way that we collect predicated instructions, we can't easily handle vpsel in tail predicated loops. There are a couple of issues: 1) It will use the VPR as a predicate operand, but doesn't have to be instead a VPT block, which means we can assert while building up the VPT block because we don't find another VPST to being a new one. 2) VPSEL still requires a VPR operand even after tail predicating, which means we can't remove it unless there is another instruction, such as vcmp, that can provide the VPR def. The first issue should be a relatively simple fix in the logic of the LowOverheadLoops pass, whereas the second will require us to represent the 'implicit' tail predication with an explicit value. Differential Revision: https://reviews.llvm.org/D72629
* [ARM][MVE] Masked gathers from base + vector of offsetsAnna Welker2020-01-141-38/+162
| | | | | | | | Enables the masked gather pass to create a masked gather loading from a base and vector of offsets. This also enables v8i16 and v16i8 gather loads. Differential Revision: https://reviews.llvm.org/D72330
* [AMDGPU] Model distance to instruction in bundleStanislav Mekhanoshin2020-01-141-5/+17
| | | | | | | This change allows to model the height of the instruction within a bundle for latency adjustment purposes. Differential Revision: https://reviews.llvm.org/D72669
* [AMDGPU] Fix getInstrLatency() always returning 1Stanislav Mekhanoshin2020-01-142-3/+7
| | | | | | | | We do not have InstrItinerary so generic getInstLatency() was always defaulting to return 1 cycle. We need to use TargetSchedModel instead to compute an instruction's latency. Differential Revision: https://reviews.llvm.org/D72655
* [X86] Copy the nofpexcept flag when folding a load into an instruction using ↵Craig Topper2020-01-131-0/+4
| | | | the load folding tables./
* [GlobalISel] Change representation of shuffle masks in MachineOperand.Eli Friedman2020-01-131-6/+3
| | | | | | | | | | | | We're planning to remove the shufflemask operand from ShuffleVectorInst (D72467); fix GlobalISel so it doesn't depend on that Constant. The change to prelegalizercombiner-shuffle-vector.mir happens because the input contains a literal "-1" in the mask (so the parser/verifier weren't really handling it properly). We now treat it as equivalent to "undef" in all contexts. Differential Revision: https://reviews.llvm.org/D72663
* [X86][Disassembler] Fix a bug when disassembling an empty stringFangrui Song2020-01-131-1/+3
| | | | | | | | | | | | | readPrefixes() assumes insn->bytes is non-empty. The code path is not exercised in llvm-mc because llvm-mc does not feed empty input to MCDisassembler::getInstruction(). This bug is uncovered by a5994c789a2982a770254ae1607b5b4cb641f73c. An empty string did not crash before because the deleted regionReader() allowed UINT64_C(-1) as insn->readerCursor. Bytes.size() <= Address -> R->Base 0 <= UINT64_C(-1) - UINT32_C(-1)
* AMDGPU/GlobalISel: Select llvm.amdgcn.ds.ordered.{add|swap}Matt Arsenault2020-01-132-0/+88
|
* AMDGPU/GlobalISel: Set insert point after waterfall loopMatt Arsenault2020-01-131-2/+3
| | | | | | | | | The current users of the waterfall loop utility functions do not make use of the restored original insert point. The insertion is either done, or they set the insert point somewhere else. A future change will want to insert instructions after the waterfall loop, but figuring out the point after the loop is more difficult than ensuring the insert point is there after the loop.
* AMDGPU/GlobalISel: Fix branch targets when emitting SI_IFMatt Arsenault2020-01-131-7/+30
| | | | | | | | The branch target needs to be changed depending on whether there is an unconditional branch or not. Loops also need to be similarly fixed, but compiling a simple testcase end to end requires another set of patches that aren't upstream yet.
* AMDGPU/GlobalISel: Simplify assertMatt Arsenault2020-01-131-11/+3
|
* [Scheduler] Remove superfluous casts. NFCDavid Green2020-01-131-1/+1
|
* [AArch64][SVE] Add patterns for some arith SVE instructions.Danilo Carvalho Grael2020-01-135-10/+67
| | | | | | | | | | | Summary: Add patterns for the following instructions: - smax, smin, umax, umin Reviewers: sdesmalen, huntergr, rengolin, efriedma, c-rhodes, mgudim, kmclaughlin Subscribers: amehsan Differential Revision: https://reviews.llvm.org/D71779
* [RISCV] Handle globals and block addresses in asm operandsLuís Marques2020-01-131-0/+8
| | | | | | | | | | Summary: These seem to be the machine operand types currently needed by the RISC-V target. Reviewers: asb, lenary Reviewed By: lenary Tags: #llvm Differential Revision: https://reviews.llvm.org/D72275
* [AArch64] Emit HINT instead of PAC insns in Armv8.2-A or belowPablo Barrio2020-01-131-15/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The Pointer Authentication Extension (PAC) was added in Armv8.3-A. Some instructions are implemented in the HINT space to allow compiling code common to CPUs regardless of whether they feature PAC or not, and still benefit from PAC protection in the PAC-enabled CPUs. The 8.3-specific mnemonics were currently enabled in any architecture, and LLVM was emitting them in assembly files when PAC code generation was enabled. This was ok for compilations where both LLVM codegen and the integrated assembler were used. However, the LLVM codegen was not compatible with other assemblers (e.g. GAS). Given the fact that the approach from these assemblers (i.e. to disallow Armv8.3-A mnemonics if compiling for Armv8.2-A or lower) is entirely reasonable, this patch makes LLVM to emit HINT when building for Armv8.2-A and below, instead of PACIASP, AUTIASP and friends. Then, LLVM assembly should be compatible with other assemblers. Reviewers: samparker, chill, LukeCheeseman Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71658
* [MIPS] Don't emit R_(MICRO)MIPS_JALR relocations against data symbolsAlex Richardson2020-01-131-0/+9
| | | | | | | | | | | | | | The R_(MICRO)MIPS_JALR optimization only works when used against functions. Using the relocation against a data symbol (e.g. function pointer) will cause some linkers that don't ignore the hint in this case (e.g. LLD prior to commit 5bab291b7b) to generate a relative branch to the data symbol which crashes at run time. Before this patch, LLVM was erroneously emitting these relocations against local-dynamic TLS function pointers and global function pointers with internal visibility. Reviewers: atanasyan, jrtc27, vstefanovic Reviewed By: atanasyan Differential Revision: https://reviews.llvm.org/D72571
* [X86] Fix MSVC "truncation from 'int' to 'bool'" warning. NFCI.Simon Pilgrim2020-01-131-2/+2
|
* ARMLowOverheadLoops: return earlier to avoid printing irrelevant dbg msg. NFCSjoerd Meijer2020-01-131-0/+1
|
* This option allows selecting the TLS size in the local exec TLS model,KAWASHIMA Takahiro2020-01-133-26/+117
| | | | | | | | | | | | | | | | | | which is the default TLS model for non-PIC objects. This allows large/ many thread local variables or a compact/fast code in an executable. Specification is same as that of GCC. For example, the code model option precedes the TLS size option. TLS access models other than local-exec are not changed. It means supoort of the large code model is only in the local exec TLS model. Patch By KAWASHIMA Takahiro (kawashima-fj <t-kawashima@fujitsu.com>) Reviewers: dmgreen, mstorsjo, t.p.northover, peter.smith, ostannard Reviewd By: peter.smith Committed by: peter.smith Differential Revision: https://reviews.llvm.org/D71688
* [RISCV] Collect Statistics on Compressed InstructionsSam Elliott2020-01-132-0/+14
| | | | | | | | | | | | | | | | | Summary: It is useful to keep statistics on how many instructions we have compressed, so we can see if future changes are increasing or decreasing this number. Reviewers: asb, luismarques Reviewed By: asb, luismarques Subscribers: xbolva00, sameer.abuasal, hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67495
* [X86] Use SDNPOptInGlue instead of SDNPInGlue on a couple SDNodes.Craig Topper2020-01-121-2/+2
| | | | | At least one of these is used without a Glue. This doesn't seem to change the X86GenDAGISel.inc output so maybe it doesn't matter?
* AMDGPU/GlobalISel: Don't use XEXEC class for SGPRsMatt Arsenault2020-01-121-1/+1
| | | | | We don't use the xexec register classes for arbitrary values anymore. Avoids a test variance beween GlobalISel and SelectionDAG>
* AMDGPU/GlobalISel: Copy type when inserting readfirstlaneMatt Arsenault2020-01-121-0/+2
| | | | | getDefIgnoringCopies will fail to find any def if no type is set if we try to use it on the use's operand, so propagate the type.
* [RISCV] Check register class for AMO memory operandsJames Clarke2020-01-132-1/+6
| | | | | | | | | | | | | | | | | | | Summary: AMO memory operands use a custom parser in order to accept both (reg) and 0(reg). However, the validation predicate used for these operands was only checking that they were registers, and not the register class, so non-GPRs (such as FPRs) were also accepted. Thus, fix this by making the predicate check that they are GPRs. Reviewers: asb, lenary Reviewed By: asb, lenary Subscribers: hiraditya, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72471
* [PowerPC] Delete PPCDarwinAsmPrinter and PPCMCAsmInfoDarwinFangrui Song2020-01-124-201/+1
| | | | | | | | Darwin support has been removed. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D72063
* [X86][AVX] Use lowerShuffleAsLanePermuteAndSHUFP to lower binary v4f64 shuffles.Simon Pilgrim2020-01-121-0/+12
| | | | | | Only perform this if we are shuffling lower and upper lane elements across the lanes (otherwise splitting to lower xmm shuffles would be better). This is a regression if we shuffle build_vectors due to getVectorShuffle canonicalizing 'blend of splat' build vectors, for now I've set this not to shuffle build_vector nodes at all to avoid this.
* [X86][AVX] lowerShuffleAsLanePermuteAndSHUFP - only set the demanded ↵Simon Pilgrim2020-01-121-2/+1
| | | | | | elements of the lane mask. Fixes an cyclic dependency issue with an upcoming patch where getVectorShuffle canonicalizes masks with splat build vector sources.
* [X86][Disassembler] Merge X86DisassemblerDecoder.cpp into ↵Fangrui Song2020-01-124-1868/+1569
| | | | X86Disassembler.cpp and refactor
* [X86][Disassembler] SimplifyFangrui Song2020-01-123-45/+7
|
* [X86] Don't call LowerSETCC from LowerSELECT for ↵Craig Topper2020-01-111-3/+1
| | | | | | | | | | | STRICT_FSETCC/STRICT_FSETCCS nodes. This causes the STRICT_FSETCC/STRICT_FSETCCS nodes to lowered early while lowering SELECT, but the output chain doesn't get connected. Then we visit the node again when it is its turn because we haven't replaced the use of the chain result. In the case of the fp128 libcall lowering, after D72341 this will cause the libcall to be emitted twice.
* [TargetLowering][X86] Connect the chain from STRICT_FSETCC in ↵Craig Topper2020-01-111-2/+4
| | | | TargetLowering::expandFP_TO_UINT and X86TargetLowering::FP_TO_INTHelper.
OpenPOWER on IntegriCloud