summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [PowerPC] Separate Features that are known to be Power9 specific from Future CPUStefan Pintilie2019-11-271-4/+13
| | | | | | | | The Power 9 CPU has some features that are unlikely to be passed on to future versions of the CPU. This patch separates this out so that future CPU does not inherit them. Differential Revision: https://reviews.llvm.org/D70466
* [PowerPC] Add new Future CPU for PowerPC in LLVMStefan Pintilie2019-11-275-4/+23
| | | | | | | | | | This is a continuation of D70262 The previous patch as listed above added the future CPU in clang. This patch adds the future CPU in the PowerPC backend. At this point the patch simply assumes that a future CPU will have the same characteristics as pwr9. Those characteristics may change with later patches. Differential Revision: https://reviews.llvm.org/D70333
* [x86] make SLM extract vector element more expensive than defaultSanjay Patel2019-11-271-0/+14
| | | | | | | | | | | | | | | | | | | I'm not sure what the effect of this change will be on all of the affected tests or a larger benchmark, but it fixes the horizontal add/sub problems noted here: https://reviews.llvm.org/D59710?vs=227972&id=228095&whitespace=ignore-most#toc The costs are based on reciprocal throughput numbers in Agner's tables for PEXTR*; these appear to be very slow ops on Silvermont. This is a small step towards the larger motivation discussed in PR43605: https://bugs.llvm.org/show_bug.cgi?id=43605 Also, it seems likely that insert/extract is the source of perf regressions on other CPUs (up to 30%) that were cited as part of the reason to revert D59710, so maybe we'll extend the table-based approach to other subtargets. Differential Revision: https://reviews.llvm.org/D70607
* [ARM][MVE][Intrinsics] Add MVE VAND/VORR/VORN/VEOR/VBIC intrinsics. Add unit ↵Mark Murray2019-11-271-45/+53
| | | | | | | | | | | | | | tests. Summary: Add MVE VAND/VORR/VORN/VEOR/VBIC intrinsics. Add unit tests. Reviewers: simon_tatham, ostannard, dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D70547
* [ARM][MVE][Intrinsics] Add MVE VMUL intrinsics. Remove annoying "t1" from ↵Mark Murray2019-11-271-21/+49
| | | | | | | | | | | | | | VMUL* instructions. Add unit tests. Summary: Add MVE VMUL intrinsics. Remove annoying "t1" from VMUL* instructions. Add unit tests. Reviewers: simon_tatham, ostannard, dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D70546
* [ARM][MVE][Intrinsics] Add MVE VABD intrinsics. Add unit tests.Mark Murray2019-11-271-9/+53
| | | | | | | | | | | | Summary: Add MVE VABD intrinsics. Add unit tests. Reviewers: simon_tatham, ostannard, dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D70545
* [ARM] Replace arm_neon_vqadds with sadd_satDavid Green2019-11-272-28/+31
| | | | | | | | | | This replaces the A32 NEON vqadds, vqaddu, vqsubs and vqsubu intrinsics with the target independent sadd_sat, uadd_sat, ssub_sat and usub_sat. This helps generate vqadds from standard IR nodes, which might be produced from the vectoriser. The old variants are removed in the process. Differential Revision: https://reviews.llvm.org/D69350
* AArch64: support the Apple NEON syntax for v8.2 crypto instructions.Tim Northover2019-11-271-11/+15
| | | | Very simple change, just adding the extra syntax variant.
* [X86] [Win64] Avoid truncating large (> 32 bit) stack allocationsMartin Storsjö2019-11-271-1/+1
| | | | | | | This fixes PR44129, which was broken in a7adc3185b (in 7.0.0 and newer). Differential Revision: https://reviews.llvm.org/D70741
* [PowerPC] [NFC] change PPCLoopPreIncPrep class name after D67088.czhengsz2019-11-263-30/+30
| | | | | | | | | | Afer https://reviews.llvm.org/D67088, PPCLoopPreIncPrep pass can prepare more instruction forms except pre inc form, like DS/DQ forms. This patch is a follow-up of https://reviews.llvm.org/D67088 to rename the pass name. Reviewed by: jsji Differential Revision: https://reviews.llvm.org/D70371
* [PowerPC] [NFC] rename PPCLoopPreIncPrep.cpp to PPCLoopInstrFormPrep.cpp ↵Jinsong Ji2019-11-272-1/+1
| | | | | | | | | | | | | | | | | | | after D67088 Summary: This is NFC code clean work after D67088. In that patch, we extend loop instructions prep for ds/dq form. This patch only changes the file name PPCLoopPreIncPrep.cpp to PPCLoopInstrFormPrep.cpp for better reviewing of the content change of file PPCLoopInstrFormPrep.cpp. Reviewers: #powerpc, nemanjai, steven.zhang, shchenz Reviewed By: #powerpc, shchenz Subscribers: wuzish, mgorny, hiraditya, kbarton, shchenz, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70716
* [mips] Fix sc, scs, ll, lld instructions expandingSimon Atanasyan2019-11-278-88/+143
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are a couple of bugs with the sc, scs, ll, lld instructions expanding: 1. On R6 these instruction pack immediate offset into a 9-bit field. Now if an immediate exceeds 9-bits assembler does not perform expansion and just rejects such instruction. 2. On 64-bit non-PIC code if an operand is a symbol assembler generates incorrect sequence of instructions. It uses R_MIPS_HI16 and R_MIPS_LO16 relocations and skips R_MIPS_HIGHEST and R_MIPS_HIGHER ones. To solve these problems this patch: - Introduces `mem_simm9_exp` to mark 9-bit memory immediate operands which require expansion. Probably later all `mem_simm9` operands will be able to migrate on `mem_simm9_exp` and we rename it to `mem_simm9`. - Adds new `OPERAND_MEM_SIMM9` operand type and assigns it to the `mem_simm9_exp`. That allows to know operand size in the `processInstruction` method and decide whether we need to expand instruction. - Adds `expandMem9Inst` method to expand instructions with 9-bit memory immediate operand. This method just load immediate into a "base" register used by origibal instruction: sc $2, 256($sp) => addiu $1, $sp, 256 sc $2, 0($1) - Fix `expandMem16Inst` to support a correct set of relocations for symbol loading in case of 64-bit non-PIC code. ll $12, symbol => lui $12, 0 R_MIPS_HIGHEST symbol daddiu $12, $12, 0 R_MIPS_HIGHER symbol dsll $12, $12, 16 daddiu $12, $12, 0 R_MIPS_HI16 symbol dsll $12, $12, 16 ll $12, 0($12) R_MIPS_LO16 symbol - Fix `expandMem16Inst` to unify handling of 3 and 4 operands instructions. - Delete unused now `MipsTargetStreamer::emitSCWithSymOffset` method. Task for next patches - implement expanding for other instructions use `mem_simm9` operand and other `mem_simm##` operands. Differential Revision: https://reviews.llvm.org/D70648
* [X86] Add strict fp support for operations of X87 instructionsCraig Topper2019-11-263-19/+47
| | | | | | | | | | This is the following patch of D68854. This patch adds basic operations of X87 instructions, including +, -, *, / , fp extensions and fp truncations. Patch by Chen Liu(LiuChen3) Differential Revision: https://reviews.llvm.org/D68857
* [ARM] Clean up the load and store code. NFCDavid Green2019-11-261-263/+246
| | | | | | | | | Some of these patterns have grown quite organically. I've tried to organise them a little here, moving all the PatFlags together and giving them a more consistent naming scheme, to allow some of the later patterns to be merged into a single multiclass. Differential Revision: https://reviews.llvm.org/D70178
* [Codegen][ARM] Add addressing modes from masked loads and storesDavid Green2019-11-266-106/+310
| | | | | | | | | | | | | | | | | | | | | | | | | | | | MVE has a basic symmetry between it's normal loads/store operations and the masked variants. This means that masked loads and stores can use pre-inc and post-inc addressing modes, just like the standard loads and stores already do. To enable that, this patch adds all the relevant infrastructure for treating masked loads/stores addressing modes in the same way as normal loads/stores. This involves: - Adding an AddressingMode to MaskedLoadStoreSDNode, along with an extra Offset operand that is added after the PtrBase. - Extending the IndexedModeActions from 8bits to 16bits to store the legality of masked operations as well as normal ones. This array is fairly small, so doubling the size still won't make it very large. Offset masked loads can then be controlled with setIndexedMaskedLoadAction, similar to standard loads. - The same methods that combine to indexed loads, such as CombineToPostIndexedLoadStore, are adjusted to handle masked loads in the same way. - The ARM backend is then adjusted to make use of these indexed masked loads/stores. - The X86 backend is adjusted to hopefully be no functional changes. Differential Revision: https://reviews.llvm.org/D70176
* [XCOFF][AIX] Check linkage on the function, and two fixes for commentsjasonliu2019-11-261-7/+11
| | | | | | This is a follow up commit to address post-commit comment in D70443 Differential revision: https://reviews.llvm.org/D70443
* [AMDGPU] Fix emitIfBreak CF lowering: use temp reg to make register ↵vpykhtin2019-11-261-2/+5
| | | | | | coalescer life easier. Differential revision: https://reviews.llvm.org/D70405
* [RISCV] Handle fcopysign(f32, f64) and fcopysign(f64, f32)Luís Marques2019-11-261-0/+3
| | | | | | | | | | | | Summary: Adds tablegen patterns to explicitly handle fcopysign where the magnitude and sign arguments have different types, due to the sign value casts being removed the by DAGCombiner. Support for RV32IF follows in a separate commit. Adds tests for all relevant scenarios except RV32IF. Reviewers: lenary Reviewed By: lenary Tags: #llvm Differential Revision: https://reviews.llvm.org/D70678
* [X86][MC] no error diagnostic for out-of-range jrcxz/jecxz/jcxzAlexey Lapshin2019-11-261-6/+21
| | | | | | | | | | | | | | Fix for PR24072: X86 instructions jrcxz/jecxz/jcxz performs short jumps if rcx/ecx/cx register is 0 The maximum relative offset for a forward short jump is 127 Bytes (0x7F). The maximum relative offset for a backward short jump is 128 Bytes (0x80). Gnu assembler warns when the distance of the jump exceeds the maximum but llvm-as does not. Patch by Konstantin Belochapka and Alexey Lapshin Differential Revision: https://reviews.llvm.org/D70652
* [AArch64][SVE] Implement floating-point conversion intrinsicsKerry McLaughlin2019-11-262-41/+64
| | | | | | | | | | | | | | | | | | | | Summary: Adds intrinsics for the following: - fcvt - fcvtzs & fcvtzu - scvtf & ucvtf - fcvtlt, fcvtnt - fcvtx & fcvtxnt Reviewers: huntergr, sdesmalen, dancgr, mgudim, efriedma Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cameron.mcinally, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70180
* [ARM][ReachingDefs] Remove dead code in loloops.Sam Parker2019-11-261-49/+140
| | | | | | | | | | | | Add some more helper functions to ReachingDefs to query the uses of a given MachineInstr and also to query whether two MachineInstrs use the same def of a register. For Arm, while tail-predicating, these helpers are used in the low-overhead loops to remove the dead code that calculates the number of loop iterations. Differential Revision: https://reviews.llvm.org/D70240
* [SystemZ] Don't build a PPA instruction with an immediate 0 operand.Jonas Paulsson2019-11-262-3/+7
| | | | | | | | | | | | The improvement in the machine verifier for operand types (D63973) discovered a bad operand in a test using a PPA instruction. It was an immediate 0 where a register was expected. This patch fixes this (NFC) by now making the PPA second register operand NoRegister instead of a zero immediate in the MIR. Review: Ulrich Weigand https://reviews.llvm.org/D70501
* [ARM][ReachingDefs] RDA in LoLoopsSam Parker2019-11-261-93/+39
| | | | | | | | | | | | | | | | | Add several new methods to ReachingDefAnalysis: - getReachingMIDef, instead of returning an integer, return the MachineInstr that produces the def. - getInstFromId, return a MachineInstr for which the given integer corresponds to. - hasSameReachingDef, return whether two MachineInstr use the same def of a register. - isRegUsedAfter, return whether a register is used after a given MachineInstr. These methods have been used in ARMLowOverhead to replace searching for uses/defs. Differential Revision: https://reviews.llvm.org/D70009
* [ARM][ConstantIslands] Correct block size updateSam Parker2019-11-261-10/+10
| | | | | | | | | When inserting a non-decrementing LE, the basic block was being resized to take into consideration that a tCMP and tBcc had been combined into one T1 instruction. This is not true in the LE case where we generate a CBN?Z and an LE. Differential Revision: https://reviews.llvm.org/D70536
* [X86] Return Op instead of SDValue() for lowering flags_read/write intrinsicsCraig Topper2019-11-251-1/+1
| | | | | | | | | | Returning SDValue() means we didn't handle it and the common code should try to expand it. But its a target intrinsic so expanding won't do anything and just leave the node alone. But it will print confusing debug messages. By returning Op we tell the common code that the node is legal and shouldn't receive any further processing.
* [BPF] add "llvm." prefix to BPF internally created globalsYonghong Song2019-11-251-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, BPF backend creates some global variables with name like <type_name>:<reloc_type>:<patch_imm>$<access_str> to carry certain information to BPF backend. With direct clang compilation, the following code in llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp is triggered and the above globals are emitted to the ELF file. (clang enabled this as opt flag -faddrsig is on by default.) if (TM.Options.EmitAddrsig) { // Emit address-significance attributes for all globals. OutStreamer->EmitAddrsig(); for (const GlobalValue &GV : M.global_values()) if (!GV.use_empty() && !GV.isThreadLocal() && !GV.hasDLLImportStorageClass() && !GV.getName().startswith("llvm.") && !GV.hasAtLeastLocalUnnamedAddr()) OutStreamer->EmitAddrsigSym(getSymbol(&GV)); } ... 10162: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND tcp_sock:0:2048$0:117 10163: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND tcp_sock:0:2112$0:126:0 10164: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND tcp_sock:1:8$0:31:6 ... While in llc, those globals are not emited since EmitAddrsig default option is false for llc. The llc flag "-addrsig" can be used to enable the above code. This patch added "llvm." prefix to these internal globals so that they can be ignored in the above codes and possible other places. Differential Revision: https://reviews.llvm.org/D70703
* [X86] Add support for STRICT_FP_ROUND/STRICT_FP_EXTEND from/to fp128 to/from ↵Craig Topper2019-11-251-17/+37
| | | | | | | | | | f32/f64/f80 in 64-bit mode. These need to emit a libcall like we do for the non-strict version. 32-bit mode needs to SoftenFloat support to be implemented for strict FP nodes. Differential Revision: https://reviews.llvm.org/D70504
* [X86] Add proper execution domain information to the avx512vnni instructions.Craig Topper2019-11-251-0/+2
|
* [PowerPC] Rename DarwinDirective to CPUDirective (NFC)Kit Barton2019-11-258-47/+50
| | | | | | | | | | | | | | | | | | | | | Summary: This patch renames the DarwinDirective (used to identify which CPU was defined) to CPUDirective. It also adds the getCPUDirective() method and replaces all uses of getDarwinDirective() with getCPUDirective(). Once this patch lands and downstream users of the getDarwinDirective() method have switched to the getCPUDirective() method, the old getDarwinDirective() method will be removed. Reviewers: nemanjai, hfinkel, power-llvm-team, jsji, echristo, #powerpc, jhibbits Reviewed By: hfinkel, jsji, jhibbits Subscribers: hiraditya, shchenz, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70352
* [ARM] Generate CMSE instructions from CMSE intrinsicsMomchil Velikov2019-11-251-4/+12
| | | | | | | | | | | This patch adds instruction selection patterns for the TT, TTT, TTA, and TTAT instructions and tests for llvm.arm.cmse.tt, llvm.arm.cmse.ttt, llvm.arm.cmse.tta, and llvm.arm.cmse.ttat intrinsics (added in a previous patch). Patch by Javed Absar. Differential Revision: https://reviews.llvm.org/D70407
* [SystemZ] Return the right offsets from getCalleeSavedSpillSlots().Jonas Paulsson2019-11-251-21/+28
| | | | | | | | | | | | | | // Due to the SystemZ ABI, the DWARF CFA (Canonical Frame Address) is not // equal to the incoming stack pointer, but to incoming stack pointer plus // 160. The getOffsetOfLocalArea() returned value is interpreted as "the // offset of the local area from the CFA". The immediate offsets into the Register save area returned by getCalleeSavedSpillSlots() should take this offset into account, which this patch makes sure of. Patch and review by Ulrich Weigand. https://reviews.llvm.org/D70427
* [PowerPC] Fix VSX clobbers of CSR registersNemanja Ivanovic2019-11-251-0/+11
| | | | | | | | If an inline asm statement clobbers a VSX register that overlaps with a callee-saved Altivec register or FPR, we will not record the clobber and will therefore violate the ABI. This is clearly a bug so this patch fixes it. Differential revision: https://reviews.llvm.org/D68576
* [AMDGPU] Fix function name in debug outputJay Foad2019-11-251-3/+3
|
* [AIX][XCOFF] Generate undefined symbol in symbol table for external function ↵jasonliu2019-11-251-2/+14
| | | | | | | | | | | | | | | | call Summary: This patch sets up the infrastructure for 1. Associate MCSymbolXCOFF with an MCSectionXCOFF when it could not get implicitly associated. 2. Generate undefined symbols. The patch itself generates undefined symbol for external function call only. Generate undefined symbol for external global variable and external function descriptors will be handled in separate patch(s) after this is land. Differential Revision: https://reviews.llvm.org/D70443
* [ARM][MVE] Select vqnegAnna Welker2019-11-251-11/+19
| | | | | | | | Adds a pattern to ARMInstrMVE.td to use a VQNEG instruction if an equivalent multi-instruction construct is found. Differential Revision: https://reviews.llvm.org/D70491
* [AVR] Fix endianness handling in AVR MCserge_sans_paille2019-11-251-5/+3
| | | | Differential Revision: https://reviews.llvm.org/D67926
* Revert "[PowerPC] combine rlwinm+rlwinm to rlwinm"czhengsz2019-11-241-100/+0
| | | | This reverts commit 29f6f9b2b2bfecccf903738e2f5a0cd0a70fce31.
* [PowerPC] Spill CR LT bits on P9 using setbAmy Kwan2019-11-241-0/+15
| | | | | | | | | | | | | | | | This patch aims to spill CR[0-7]LT bits on POWER9 using the setb instruction. The sequence on P9 to spill these bits will be: setb %reg, %CRREG stw %reg, $FI Instead of the typical sequence: mfocrf %reg, %CRREG rlwinm %reg1, %reg, $SH, 0, 0 stw %reg1, $FI Differential Revision: https://reviews.llvm.org/D68443
* Reland 'Fixed -Wdeprecated-copy warnings. NFCI.'Dávid Bolvanský2019-11-232-2/+16
| | | | Fixed hashtable copy ctor.
* Revert 'Fixed -Wdeprecated-copy warnings. NFCI.'Dávid Bolvanský2019-11-232-16/+2
| | | | pdbutil's test is failing.
* Fixed -Wdeprecated-copy warnings. NFCI.Dávid Bolvanský2019-11-232-2/+16
|
* [NFC] [AArch64] Fix wrong documentation for IsStoreRegOffsetOpDavid Tellenbach2019-11-231-1/+1
|
* AMDGPU: Handle waitcnt overflowAustin Kerbow2019-11-231-1/+4
| | | | | | | | | | | | | | | | | | | | | | Summary: The waitcnt pass can overflow the counters when the number of outstanding events for a type exceed the capacity of the counter. This can lead to inefficient insertion of waitcnts, or to waitcnt instructions with max values for each type. The last situation can cause an instruction which when disassembled appears to be an illegal waitcnt without an operand. In these cases we should add a wait for the 'counter maximum' - 1, and update the waitcnt brackets accordingly. Reviewers: rampitec, arsenm Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70418
* [X86][SSE] Split off generic isLaneCrossingShuffleMask helper. NFC.Simon Pilgrim2019-11-231-3/+14
| | | | Avoid MVT dependency which will be needed in a future patch.
* [AArch64] Add the pipeline model for Exynos M5Evandro Menezes2019-11-222-1/+1014
| | | | Add the scheduling and cost models for Exynos M5.
* [WebAssembly][SelectionDAG] Remove unused WebAssemblyDAGToDAGISel::ForCodeSize.Hiroshi Yamauchi2019-11-221-4/+1
| | | | | | | | | | | | | | | | | | | | | | Summary: This follows from the discussion at D70095. D70095 moves hasOptSize calls into SelectionDAG::shouldOptForSize to allow querying size optimization conditions together with profile guided size optimization. Since it appears that size optimizations for WebAssembly SelectionDAG haven't been implemented yet and thus ForCodeSize is unused, and it would not make a lot of sense to call shouldOptForSize here as the necessary profile data like PSI/BFI aren't available at this point, it seems good and less confusing to remove this for now and use shouldOptForSize when they are implemented in the future. Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, aheejin, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70567
* [BPF] Fix a recursion bug in BPF Peephole ZEXT optimizationYonghong Song2019-11-221-2/+13
| | | | | | | | | | | | | | | | | | | Commit a0841dfe8594 ("[BPF] Fix a bug in peephole optimization") fixed a bug in peephole optimization. Recursion is introduced to handle COPY and PHI instructions. Unfortunately, multiple PHI instructions may form a cycle and this will cause infinite recursion, eventual segfault. For Commit a0841dfe8594, I indeed tried a few loops to ensure that I won't see the recursion, but I did not try with complex control flows, which, as demonstrated with the test case in this patch, may introduce PHI cycles. This patch fixed the issue by introducing a set to remember visited PHI instructions. This way, cycles can be properly detected and handled. Differential Revision: https://reviews.llvm.org/D70586
* [PowerPC] Implement the vector extend sign instruction pattern matchQingShan Zhang2019-11-222-0/+14
| | | | | | | Power9 has instructions to implement the semantics of SIGN_EXTEND_INREG for vector type. Mark it as legal and add the match pattern. Differential Revision: https://reviews.llvm.org/D69601
* [PowerPC] combine rlwinm+rlwinm to rlwinmczhengsz2019-11-221-0/+100
| | | | | | | | | | | | | combine x3 = rlwinm x3, 27, 5, 31 x3 = rlwinm x3, 19, 0, 12 to x3 = rlwinm x3, 14, 0, 12 Reviewed by: steven.zhang Differential Revision: https://reviews.llvm.org/D70374
* [X86] Add test cases for most of the constrained fp libcalls with fp128.Craig Topper2019-11-211-4/+8
| | | | | | | | | | Add explicit setOperation actions for some to match their none strict counterparts. This isn't required, but makes the code self documenting that we didn't forget about strict fp. I've used LibCall instead of Expand since that's more explicitly what we want. Only lrint/llrint/lround/llround are missing now.
OpenPOWER on IntegriCloud