summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [MachineOutliner][NFC] Make Candidates own their call informationJessica Paquette2018-07-244-41/+55
| | | | | | | | | | | | | Before this, TCI contained all the call information for each Candidate. This moves that information onto the Candidates. As a result, each Candidate can now supply how it ought to be called. Thus, Candidates will be able to, say, call the same function in cheaper ways when possible. This also removes that information from TCI, since it's no longer used there. A follow-up patch for the AArch64 outliner will demonstrate this. llvm-svn: 337840
* [mips] Fix local dynamic TLS with Sym64Simon Atanasyan2018-07-246-22/+20
| | | | | | | | | | | | | | | | For the final DTPREL addition, rather than a lui/daddiu/daddu triple, LLVM was erronously emitting a daddiu/daddiu pair, treating the %dtprel_hi as if it were a %dtprel_lo, since Mips::Hi expands unshifted for Sym64. Instead, use a new TlsHi node and, although unnecessary due to the exact structure of the nodes emitted, use TlsHi for local exec too to prevent future bugs. Also garbage-collect the unused TprelLo and TlsGd nodes, and TprelHi since its functionality is provided by the new common TlsHi node. Patch by James Clarke. Differential revision: https://reviews.llvm.org/D49259 llvm-svn: 337827
* [x86/SLH] Extract the core register hardening logic to a low-levelChandler Carruth2018-07-241-36/+73
| | | | | | | | | | | | | | | | | helper and restructure the post-load hardening to use this. This isn't as trivial as I would have liked because the post-load hardening used a trick that only works for it where it swapped in a temporary register to the load rather than replacing anything. However, there is a simple way to do this without that trick that allows this to easily reuse a friendly API for hardening a value in a register. That API will in turn be usable in subsequent patcehs. This also techincally changes the position at which we insert the subreg extraction for the predicate state, but that never resulted in an actual instruction and so tests don't change at all. llvm-svn: 337825
* [x86/SLH] Tidy up a comment, using doxygen structure and wording it toChandler Carruth2018-07-241-5/+7
| | | | | | be more accurate and understandable. llvm-svn: 337822
* [ARM] Disable ARMCodeGenPrepare by defaultSam Parker2018-07-241-1/+1
| | | | | | | | ARM Stage 2 builders have been suspiciously broken since the pass was committed. Disabling to hopefully fix the bots and give me time to debug. llvm-svn: 337821
* AMDGPU/GlobalISel: Legalize G_INSERTTom Stellard2018-07-241-1/+1
| | | | | | | | | | Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49601 llvm-svn: 337798
* AMDGPU/GlobalISel: Remove unnecessary legality constraint for G_EXTRACTTom Stellard2018-07-241-3/+0
| | | | | | | | | | | | | | | | | | Summary: We were marking G_EXTRACT operations unsupported if the output type was larger than the input type. I don't see how this could ever actually happen, so I dropped the constraint. Doing this makes it possible to reuse the same legality code for G_INSERT. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49600 llvm-svn: 337794
* [x86/SLH] Simplify the code for hardening a loaded value. NFC.Chandler Carruth2018-07-241-20/+15
| | | | | | | This is in preparation for extracting this into a re-usable utility in this code. llvm-svn: 337785
* [x86/SLH] Remove complex SHRX-based post-load hardening.Chandler Carruth2018-07-241-73/+10
| | | | | | | | | | | | | | | | | | | This code was really nasty, had several bugs in it originally, and wasn't carrying its weight. While on Zen we have all 4 ports available for SHRX, on all of the Intel parts with Agner's tables, SHRX can only execute on 2 ports, giving it 1/2 the throughput of OR. Worse, all too often this pattern required two SHRX instructions in a chain, hurting the critical path by a lot. Even if we end up needing to safe/restore EFLAGS, that is no longer so bad. We pay for a uop to save the flag, but we very likely get fusion when it is used by forming a test/jCC pair or something similar. In practice, I don't expect the SHRX to be a significant savings here, so I'd like to avoid the complex code required. We can always resurrect this if/when someone has a specific performance issue addressed by it. llvm-svn: 337781
* [AArch64] Use MCAsmInfoMicrosoft and MCAsmInfoGNUCOFF as base classesMartin Storsjo2018-07-232-9/+14
| | | | | | | | | | | | | | | | This matches the structure used on X86 and ARM. This requires a little bit of duplication of the parts that are equal in both AArch64 COFF variants though. Before SVN r335286, these classes didn't add anything that MCAsmInfoCOFF didn't, but now they do. This makes AArch64 match X86 in how comdat is used for float constants for MinGW. Differential Revision: https://reviews.llvm.org/D49637 llvm-svn: 337755
* Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code ↵Reid Kleckner2018-07-236-29/+132
| | | | | | | | | | | | | | models" Don't try to generate large PIC code for non-ELF targets. Neither COFF nor MachO have relocations for large position independent code, and users have been using "large PIC" code models to JIT 64-bit code for a while now. With this change, if they are generating ELF code, their JITed code will truly be PIC, but if they target MachO or COFF, it will contain 64-bit immediates that directly reference external symbols. For a JIT, that's perfectly fine. llvm-svn: 337740
* [Hexagon] Handle unnamed globals in HexagonConstExprKrzysztof Parzyszek2018-07-231-3/+15
| | | | | | Instead of comparing names, compare positions in the parent module. llvm-svn: 337723
* [ARM] Use unique_ptr to fix memory leak introduced in r337701Fangrui Song2018-07-231-11/+9
| | | | llvm-svn: 337714
* OpChain has subclasses, so add a virtual destructor.Jordan Rupprecht2018-07-231-0/+1
| | | | | | | | | | | | | | | Summary: OpChain has subclasses, so add a virtual destructor. This fixes an issue when deleting subclasses of OpChain (see MatchSMLAD() specifically) in r337701. Reviewers: javed.absar Subscribers: llvm-commits, SjoerdMeijer, samparker Differential Revision: https://reviews.llvm.org/D49681 llvm-svn: 337713
* [ARM] Follow-up to r337709.Matt Morehouse2018-07-231-2/+0
| | | | | | Fix double-free. llvm-svn: 337711
* [ARM] Add doFinalization() to ARMCodeGenPrepare pass.Matt Morehouse2018-07-231-0/+6
| | | | | | | Attempt to fix the leak introduced in r337687 and make sanitizer buildbots green again. llvm-svn: 337709
* [ARM][NFC] ParallelDSP reorganisationSam Parker2018-07-231-88/+103
| | | | | | | | | | | | | | | | | In preparing to allow ARMParallelDSP pass to parallelise more than smlads, I've restructed some elements: - The ParallelMAC struct has been renamed to BinOpChain. - The BinOpChain struct holds two value lists: LHS and RHS, as well as inheriting from the OpChain base class. - The OpChain struct holds all the values of the represented chain and has had the memory locations functionality inserted into it. - ParallelMACList becomes OpChainList and it now holds pointers instead of objects. Differential Revision: https://reviews.llvm.org/D49020 llvm-svn: 337701
* [SystemZ] Fix dumpSU() method in SystemZHazardRecognizer.Jonas Paulsson2018-07-231-1/+5
| | | | | | | | Two minor issues: The new MCD SchedWrite name does not contain "Unit" like all the others, so a check is needed. Also, print "LSU" instead of "LS". Review: Ulrich Weigand llvm-svn: 337700
* [ARM] ARMCodeGenPrepare backend passSam Parker2018-07-234-0/+757
| | | | | | | | | | | | | | | | | | | | | | Arm specific codegen prepare is implemented to perform type promotion on icmp operands, which can enable the removal of uxtb and uxth (unsigned extend) instructions. This is possible because performing type promotion before ISel alleviates this duty from the DAG builder which has to perform legalisation, but has a limited view on data ranges. The pass visits any instruction operand of an icmp and creates a worklist to traverse the use-def tree to determine whether the values can simply be promoted. Our concern is values in the registers overflowing the narrow (i8, i16) data range, so instructions marked with nuw can be promoted easily. For add and sub instructions, we are able to use the parallel dsp instructions to operate on scalar data types and avoid overflowing bits. Underflowing adds and subs are also permitted when the result is only used by an unsigned icmp. Differential Revision: https://reviews.llvm.org/D48832 llvm-svn: 337687
* [NFC][MCA] ZnVer1: Update RegisterFile to identify false dependencies on ↵Roman Lebedev2018-07-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | partially written registers. Summary: Pretty mechanical follow-up for D49196. As microarchitecture.pdf notes, "20 AMD Ryzen pipeline", "20.8 Register renaming and out-of-order schedulers": The integer register file has 168 physical registers of 64 bits each. The floating point register file has 160 registers of 128 bits each. "20.14 Partial register access": The processor always keeps the different parts of an integer register together. ... An instruction that writes to part of a register will therefore have a false dependence on any previous write to the same register or any part of it. Reviewers: andreadb, courbet, RKSimon, craig.topper, GGanesh Reviewed By: GGanesh Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D49393 llvm-svn: 337676
* [x86/SLH] Fix a bug where we would harden tail calls twice -- once asChandler Carruth2018-07-231-1/+5
| | | | | | | | | a call, and then again as a return. Also added a comment to try and explain better why we would be doing what we're doing when hardening the (non-call) returns. llvm-svn: 337673
* [x86/SLH] Rename and comment the main hardening function. NFC.Chandler Carruth2018-07-231-4/+21
| | | | | | | | | | This provides an overview of the algorithm used to harden specific loads. It also brings this our terminology further in line with hardening rather than checking. Differential Revision: https://reviews.llvm.org/D49583 llvm-svn: 337667
* [X86] Remove the max vector width restriction from combineLoopMAddPattern ↵Craig Topper2018-07-221-7/+1
| | | | | | | | and rely splitOpsAndApply to handle splitting. This seems to be a net improvement. There's still an issue under avx512f where we have a 512-bit vpaddd, but not vpmaddwd so we end up doing two 256-bit vpmaddwds and inserting the results before a 512-bit vpaddd. It might be better to do two 512-bits paddds with zeros in the upper half. Same number of instructions, but breaks a dependency. llvm-svn: 337656
* [mips] Factor out register class selection for global base register. NFCSimon Atanasyan2018-07-211-18/+20
| | | | | | | Factor out register class selection for global base register into a separate function to escape long chain of ternary operators. llvm-svn: 337647
* [mips] Move out the WrapperPat declaration from the NotInMicroMips predicateSimon Atanasyan2018-07-211-5/+4
| | | | | | | | | | | | | | | This is a follow-up to the rL335185. Those commit adds some WrapperPat patterns for microMIPS target. But declaration of the WrapperPat class is under the NotInMicroMips predicate and microMIPS patterns cannot be selected because predicate (Subtarget->inMicroMipsMode()) && (!Subtarget->inMicroMipsMode()) is always false. This change move out the WrapperPat class declaration from the NotInMicroMips predicate and enables microMIPS WrapperPat patterns. Differential revision: https://reviews.llvm.org/D49533 llvm-svn: 337646
* AMDGPU: Use existing function to check for VGPRsMatt Arsenault2018-07-201-16/+7
| | | | llvm-svn: 337621
* Revert "[X86][AVX] Convert X86ISD::VBROADCAST demanded elts combine to use ↵Benjamin Kramer2018-07-202-48/+17
| | | | | | | | SimplifyDemandedVectorElts" This reverts commit r337547. It triggers an infinite loop. llvm-svn: 337617
* [X86] Remove isel patterns for MOVSS/MOVSD ISD opcodes with integer types.Craig Topper2018-07-204-79/+8
| | | | | | | | | | Ideally our ISD node types going into the isel table would have types consistent with their instruction domain. This prevents us having to duplicate patterns with different types for the same instruction. Unfortunately, it seems our shuffle combining is currently relying on this a little remove some bitcasts. This seems to enable some switching between shufps and shufd. Hopefully there's some way we can address this in the combining. Differential Revision: https://reviews.llvm.org/D49280 llvm-svn: 337590
* [X86] Remove what appear to be unnecessary uses of DCI.CombineToCraig Topper2018-07-201-19/+13
| | | | | | | | | | CombineTo is most useful when you need to replace multiple results, avoid the worklist management, or you need to something else after the combine, etc. Otherwise you should be able to just return the new node and let DAGCombiner go through its usual worklist code. All of the places changed in this patch look to be standard cases where we should be able to use the more stand behavior of just returning the new node. Differential Revision: https://reviews.llvm.org/D49569 llvm-svn: 337589
* [X86][XOP] Fix SUB constant folding for VPSHA/VPSHL shift loweringSimon Pilgrim2018-07-201-4/+3
| | | | | | | | We can safely use getConstant here as we're still lowering, which allows constant folding to kick in and simplify the vector shift codegen. Noticed while working on D49562. llvm-svn: 337578
* [ARM] Add new feature to enable optimizing the VFP registersEvandro Menezes2018-07-203-2/+15
| | | | | | | | | Enable the optimization of operations on DPR and SPR via a feature instead of checking the target. Differential revision: https://reviews.llvm.org/D49463 llvm-svn: 337575
* [X86][SSE] Use SplitOpsAndApply to improve HADD/HSUB loweringSimon Pilgrim2018-07-201-8/+20
| | | | | | Improve AVX1 256-bit vector HADD/HSUB matching by using SplitOpsAndApply to split into 128-bit instructions. llvm-svn: 337568
* [X86][AVX] Add support for i16 256-bit vector horizontal op redundant ↵Simon Pilgrim2018-07-201-1/+3
| | | | | | shuffle removal llvm-svn: 337566
* [X86][AVX] Add support for 32/64 bits 256-bit vector horizontal op redundant ↵Simon Pilgrim2018-07-201-5/+11
| | | | | | shuffle removal llvm-svn: 337561
* [X86][AVX] Convert X86ISD::VBROADCAST demanded elts combine to use ↵Simon Pilgrim2018-07-202-17/+48
| | | | | | | | | | SimplifyDemandedVectorElts This is an early step towards using SimplifyDemandedVectorElts for target shuffle combining - this merely moves the existing X86ISD::VBROADCAST simplification code to use the SimplifyDemandedVectorElts mechanism. Adds X86TargetLowering::SimplifyDemandedVectorEltsForTargetNode to handle X86ISD::VBROADCAST - in time we can support all target shuffles (and other ops) here. llvm-svn: 337547
* [SystemZ] Reimplent SchedModel IssueWidth and WriteRes/ReadAdvance mappings.Jonas Paulsson2018-07-205-2933/+3252
| | | | | | | | | | | | | | | | | As a consequence of recent discussions (http://lists.llvm.org/pipermail/llvm-dev/2018-May/123164.html), this patch changes the SystemZ SchedModels so that the IssueWidth is 6, which is the decoder capacity, and NumMicroOps become the number of decoder slots needed per instruction. In addition, the SchedWrite latencies now match the MachineInstructions def-operand indexes, and ReadAdvances have been added on instructions with one register operand and one memory operand. Review: Ulrich Weigand https://reviews.llvm.org/D47008 llvm-svn: 337538
* Improved sched model for X86 BSWAP* instrs.Andrew V. Tischenko2018-07-2011-79/+35
| | | | | | Differential Revision: https://reviews.llvm.org/D49477 llvm-svn: 337537
* Reapply "AMDGPU: Fix handling of alignment padding in DAG argument lowering"Matt Arsenault2018-07-2013-200/+219
| | | | | | Reverts r337079 with fix for msan error. llvm-svn: 337535
* [AArch64][SVE] Asm: Support for bit/byte reverse operations.Sander de Smalen2018-07-202-0/+94
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the following instructions: RBIT reverse bits within each active elemnt (predicated), e.g. rbit z0.d, p0/m, z1.d for 8, 16, 32 and 64 bit elements. REV reverse order of elements in data/predicate vector (unpredicated), e.g. rev z0.d, z1.d rev p0.d, p1.d for 8, 16, 32 and 64 bit elements. REVB reverse order of bytes within each active element, e.g. revb z0.d, p0/m, z1.d for 16, 32 and 64 bit elements. REVH reverse order of 16-bit half-words within each active element, e.g. revh z0.d, p0/m, z1.d for 32 and 64 bit elements. REVW reverse order of 32-bit words within each active element, e.g. revw z0.d, p0/m, z1.d for 64 bit elements. llvm-svn: 337534
* [AArch64][SVE] Asm: Support for FTMAD instruction.Sander de Smalen2018-07-202-0/+27
| | | | | | | | | | | Floating-point trigonometric multiply-add coefficient, e.g. ftmad z0.h, z0.h, z1.h, #7 with variants for 16, 32 and 64-bit elements. llvm-svn: 337533
* [WebAssembly] Disable a test that violates DR1696Heejin Ahn2018-07-201-0/+1
| | | | | | | | | | | | | | Summary: lifetime2.C violates DR1696, which prevents reference members from being initialized to temporaries, whose lifetime would end at the end of ctor. Reviewers: sbc100 Subscribers: dschuff, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D49577 llvm-svn: 337512
* [x86/SLH] Clean up helper naming for return instruction handling andChandler Carruth2018-07-191-4/+26
| | | | | | | | | | | | | | remove dead declaration of a call instruction handling helper. This moves to the 'harden' terminology that I've been trying to settle on for returns. It also adds a really detailed comment explaining what all we're trying to accomplish with return instructions and why. Hopefully this makes it much more clear what exactly is being "hardened". Differential Revision: https://reviews.llvm.org/D49571 llvm-svn: 337510
* [X86][AVX] Use extract_subvector to reduce vector op widths (PR36761)Simon Pilgrim2018-07-191-0/+25
| | | | | | | | | | We have a number of cases where we fail to reduce vector op widths, performing the op in a larger vector and then extracting a subvector. This is often because by default it would create illegal types. This peephole patch attempts to handle a few common cases detailed in PR36761, which typically involved extension+conversion to vX2f64 types. Differential Revision: https://reviews.llvm.org/D49556 llvm-svn: 337500
* [X86] Fix some 'return SDValue()' after DCI.CombineTo instead return the ↵Craig Topper2018-07-191-13/+7
| | | | | | | | | | output of CombineTo Returning SDValue() means nothing was changed. Returning the result of CombineTo returns the first argument of CombineTo. This is specially detected by DAGCombiner as meaning that something changed, but worklist management was already taken care of. I think the only real effect of this change is that we now properly update the Statistic the counts the number of combines performed. That's the only thing between the check for null and the check for N in the DAGCombiner. llvm-svn: 337491
* [Power9] Code Cleanup - Remove needsAggressiveScheduling()Stefan Pintilie2018-07-191-27/+8
| | | | | | | | | As we already return true from needsAggressiveScheduling() for the most recent hardware it would be cleaner to just return true for all PowerPC hardware. Differential Revision: https://reviews.llvm.org/D48663 llvm-svn: 337488
* [X86][BtVer2] correctly model the latency/throughput of LEA instructions.Andrea Di Biagio2018-07-197-14/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes the latency/throughput of LEA instructions in the BtVer2 scheduling model. On Jaguar, A 3-operands LEA has a latency of 2cy, and a reciprocal throughput of 1. That is because it uses one cycle of SAGU followed by 1cy of ALU1. An LEA with a "Scale" operand is also slow, and it has the same latency profile as the 3-operands LEA. An LEA16r has a latency of 3cy, and a throughput of 0.5 (i.e. RThrouhgput of 2.0). This patch adds a new TIIPredicate named IsThreeOperandsLEAFn to X86Schedule.td. The tablegen backend (for instruction-info) expands that definition into this (file X86GenInstrInfo.inc): ``` static bool isThreeOperandsLEA(const MachineInstr &MI) { return ( ( MI.getOpcode() == X86::LEA32r || MI.getOpcode() == X86::LEA64r || MI.getOpcode() == X86::LEA64_32r || MI.getOpcode() == X86::LEA16r ) && MI.getOperand(1).isReg() && MI.getOperand(1).getReg() != 0 && MI.getOperand(3).isReg() && MI.getOperand(3).getReg() != 0 && ( ( MI.getOperand(4).isImm() && MI.getOperand(4).getImm() != 0 ) || (MI.getOperand(4).isGlobal()) ) ); } ``` A similar method is generated in the X86_MC namespace, and included into X86MCTargetDesc.cpp (the declaration lives in X86MCTargetDesc.h). Back to the BtVer2 scheduling model: A new scheduling predicate named JSlowLEAPredicate now checks if either the instruction is a three-operands LEA, or it is an LEA with a Scale value different than 1. A variant scheduling class uses that new predicate to correctly select the appropriate latency profile. Differential Revision: https://reviews.llvm.org/D49436 llvm-svn: 337469
* ARM: switch armv7em MachO triple to hard-float defaults and libcalls.Tim Northover2018-07-191-0/+2
| | | | | | | | | We were emitting incorrect calls to libm functions that LLVM had decided it knew about because the default is soft-float. Recommitted without breaking ELF this time. llvm-svn: 337450
* [x86/SLH] Major refactoring of SLH implementaiton. There are two bigChandler Carruth2018-07-191-174/+204
| | | | | | | | | | | | | | | | | | | | | | | | | | | | changes that are intertwined here: 1) Extracting the tracing of predicate state through the CFG to its own function. 2) Creating a struct to manage the predicate state used throughout the pass. Doing #1 necessitates and motivates the particular approach for #2 as now the predicate management is spread across different functions focused on different aspects of it. A number of simplifications then fell out as a direct consequence. I went with an Optional to make it more natural to construct the MachineSSAUpdater object. This is probably the single largest outstanding refactoring step I have. Things get a bit more surgical from here. My current goal, beyond generally making this maintainable long-term, is to implement several improvements to how we do interprocedural tracking of predicate state. But I don't want to do that until the predicate state management and tracing is in reasonably clear state. Differential Revision: https://reviews.llvm.org/D49427 llvm-svn: 337446
* Fix spelling mistake in comments. NFCI.Simon Pilgrim2018-07-191-2/+2
| | | | llvm-svn: 337442
* [WebAssembly] Add missing -mattr=+exception-handling guardsHeejin Ahn2018-07-181-0/+3
| | | | | | | | | | | | | | Summary: The use of exception handling instructions should only be enabled with `-mattr=+exception-handling` option. Reviewers: jgravelle-google Subscribers: dschuff, sbc100, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D49391 llvm-svn: 337425
OpenPOWER on IntegriCloud