summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Split WriteFRcp/WriteFRsqrt/WriteFSqrt schedule classesSimon Pilgrim2018-05-0713-603/+228
| | | | | | | | | | | | | WriteFRcp/WriteFRsqrt are split to support scalar, XMM and YMM/ZMM instructions. WriteFSqrt is split into single/double/long-double sizes and scalar, XMM, YMM and ZMM instructions. This removes all InstrRW overrides for these instructions. NOTE: There were a couple of typos in the Znver1 model - notably a 1cy throughput for SQRT that is highly unlikely and doesn't tally with Agner. NOTE: I had to add Agner's numbers for several targets for WriteFSqrt80. llvm-svn: 331629
* [SystemZ] Bugfix for MVCLoop CC clobbering.Jonas Paulsson2018-05-071-1/+1
| | | | | | | | MVCLoop clobbers CC (since it emits a compare/branch), but this was not modelled. Review: Ulrich Weigand llvm-svn: 331627
* [ARM] Select result 1 from ConvertBooleanCarryToCarryFlag's result ↵Amaury Sechet2018-05-071-4/+6
| | | | | | | | automatically. NFC The old behavior return the value 0, which is error prone. llvm-svn: 331614
* [X86] Fix copy/paste mistake in comment. NFCCraig Topper2018-05-071-1/+1
| | | | llvm-svn: 331611
* [X86] Enable reciprocal estimates for v16f32 vectors by using ↵Craig Topper2018-05-061-6/+10
| | | | | | | | | | | | | | | | | | | VRCP14PS/VRSQRT14PS Summary: The legacy VRCPPS/VRSQRTPS instructions aren't available in 512-bit versions. The new increased precision versions are. So we can use those to implement v16f32 reciprocal estimates. For KNL CPUs we can probably use VRCP28PS/VRSQRT28PS and avoid the NR step altogether, but I leave that for a future patch. Reviewers: spatel Reviewed By: spatel Subscribers: RKSimon, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D46498 llvm-svn: 331606
* [globalisel] Update GlobalISel emitter to match new representation of ↵Daniel Sanders2018-05-052-4/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | extending loads Summary: Previously, a extending load was represented at (G_*EXT (G_LOAD x)). This had a few drawbacks: * G_LOAD had to be legal for all sizes you could extend from, even if registers didn't naturally hold those sizes. * All sizes you could extend from had to be allocatable just in case the extend went missing (e.g. by optimization). * At minimum, G_*EXT and G_TRUNC had to be legal for these sizes. As we improve optimization of extends and truncates, this legality requirement would spread without considerable care w.r.t when certain combines were permitted. * The SelectionDAG importer required some ugly and fragile pattern rewriting to translate patterns into this style. This patch changes the representation to: * (G_[SZ]EXTLOAD x) * (G_LOAD x) any-extends when MMO.getSize() * 8 < ResultTy.getSizeInBits() which resolves these issues by allowing targets to work entirely in their native register sizes, and by having a more direct translation from SelectionDAG patterns. Each extending load can be lowered by the legalizer into separate extends and loads, however a target that supports s1 will need the any-extending load to extend to at least s8 since LLVM does not represent memory accesses smaller than 8 bit. The legalizer can widenScalar G_LOAD into an any-extending load but sign/zero-extending loads need help from something else like a combiner pass. A follow-up patch that adds combiner helpers for for this will follow. The new representation requires that the MMO correctly reflect the memory access so this has been corrected in a couple tests. I've also moved the extending loads to their own tests since they are (mostly) separate opcodes now. Additionally, the re-write appears to have invalidated two tests from select-with-no-legality-check.mir since the matcher table no longer contains loads that result in s1's and they aren't legal in AArch64 anymore. Depends on D45540 Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar Reviewed By: rtereshin Subscribers: javed.absar, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D45541 llvm-svn: 331601
* Simplify LLVM_ATTRIBUTE_USED call sites.Fangrui Song2018-05-051-2/+2
| | | | llvm-svn: 331599
* Fix a bunch of places where operator-> was used directly on the return from ↵Craig Topper2018-05-057-13/+12
| | | | | | | | | | dyn_cast. Inspired by r331508, I did a grep and found these. Mostly just change from dyn_cast to cast. Some cases also showed a dyn_cast result being converted to bool, so those I changed to isa. llvm-svn: 331577
* AMDGPU/NFC: Update D16PreservesUnusedBits description based Tony Tye's commentsKonstantin Zhuravlyov2018-05-041-1/+3
| | | | llvm-svn: 331564
* AMDGPU/NFC: Fix formatting for 900, 902 ISA Version featuresKonstantin Zhuravlyov2018-05-041-4/+2
| | | | llvm-svn: 331553
* AMDGPU: Add D16 instructions preserve unused bits featureKonstantin Zhuravlyov2018-05-046-9/+27
| | | | | | | | | - Predicate D16 patterns on this new feature - Added this new feature to gfx900/2/4 Differential Revision: https://reviews.llvm.org/D46366 llvm-svn: 331551
* Fast Math Flag mapping into SDNodeMichael Berg2018-05-042-6/+5
| | | | | | | | | | | | | | Summary: Adding support for Fast flags in the SDNode to leverage fast math sub flag usage. Reviewers: spatel, arsenm, jbhateja, hfinkel, escha, qcolombet, echristo, wristow, javed.absar Reviewed By: spatel Subscribers: llvm-commits, rampitec, nhaehnle, tstellar, FarhanaAleen, nemanjai, javed.absar, jbhateja, hfinkel, wdng Differential Revision: https://reviews.llvm.org/D45710 llvm-svn: 331547
* [X86] Add WriteEMMS scheduler classSimon Pilgrim2018-05-0412-34/+15
| | | | | | Filled in the missing values from Btver2 SoG or Agner llvm-svn: 331546
* [X86] Finish splitting WriteVecShift and WriteVecIMul to remove InstRW ↵Simon Pilgrim2018-05-0411-103/+44
| | | | | | overrides. llvm-svn: 331543
* [llvm-exegesis] Fix pfm counter names for BDW.Clement Courbet2018-05-041-8/+8
| | | | | | | | | | | | Summary: They are not consistent with other microarchitectures. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D46434 llvm-svn: 331532
* [X86] Cleanup SchedWriteFMA classes and use X86SchedWriteWidths directly.Simon Pilgrim2018-05-0412-82/+80
| | | | | | Rename scalar and XMM versions, this is to match/simplify an upcoming change to split MUL/DIV/SQRT scalar/xmm/ymm/zmm classes. llvm-svn: 331531
* [Hexagon] Remove leftover debugging code after r331527Krzysztof Parzyszek2018-05-041-1/+0
| | | | llvm-svn: 331528
* [Hexagon] Handle non-immediate constants in HexagonSplitDoubleKrzysztof Parzyszek2018-05-042-24/+28
| | | | llvm-svn: 331527
* [mips] Correct the predicates of sign extension instructionsSimon Dardis2018-05-044-29/+5
| | | | | | | | | | And eliminatw the duplication of those instructions for microMIPS32r6. Reviewers: smaksimovic, abeserminji, atanasyan Differential Revision: https://reviews.llvm.org/D46117 llvm-svn: 331526
* [X86] Add WriteVecMOVMSKY scheduler classSimon Pilgrim2018-05-0411-40/+48
| | | | llvm-svn: 331525
* [AArch64] Custom Lower MULLH{S,U} for v16i8, v8i16, and v4i32Adhemerval Zanella2018-05-042-2/+89
| | | | | | | | | | | | | | This patch adds a custom lowering for ISD::MULH{S,U} used on divide by constant optimization (DAGCombiner::BuildSDIV and DAGCombiner::BuildUDIV). New patterns for smull and umull are added, so AArch64ISD::{S,U}MULL can be correctly lowered to smull2 and umull2. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D46009 llvm-svn: 331522
* [Hexagon] Skip reserved physical registers when updating livenessKrzysztof Parzyszek2018-05-041-1/+8
| | | | llvm-svn: 331518
* [X86] Add SchedWriteFRnd fp rounding scheduler classesSimon Pilgrim2018-05-0413-164/+67
| | | | | | | | Split off from SchedWriteFAdd for fp rounding/bit-manipulation instructions. Fixes an issue on btver2 which only had the ymm version using the JSTC pipe instead of JFPA. llvm-svn: 331515
* AMDGPU: Make getSubRegFromChannel a static member of AMDGPURegisterInfoTom Stellard2018-05-035-9/+9
| | | | | | | | | | | | | | | | Summary: This makes is possible to have R600RegisterInfo and SIRegisterInfo not inherit from AMDGPURegisterInfo. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D46280 llvm-svn: 331490
* [X86] Add WriteDPPD/WriteDPPS dot product scheduler classesSimon Pilgrim2018-05-0311-232/+42
| | | | llvm-svn: 331489
* [X86][Znver1] Use SchedAlias to tag microcoded scheduler classesSimon Pilgrim2018-05-031-32/+30
| | | | | | | | Avoids extra entries in the class tables. Found a typo that missed the MMX_PHSUBSW instruction. llvm-svn: 331488
* [X86][AVX512] VPLZCNT instructions match SchedWriteVecIMul scheduling class ↵Simon Pilgrim2018-05-032-17/+4
| | | | | | not SchedWriteVecALU. llvm-svn: 331473
* [X86] Split WriteVecShift/WriteVarVecShift into MMX, XMM and YMM/ZMM ↵Simon Pilgrim2018-05-0314-597/+170
| | | | | | | | scheduler classes This took a bit of extra work as on Intel targets the old (V)PSLLDrr/(V)PSLLDrm style instructions act differently - I ended up creating WriteVecShiftImm classes for XMM/YMM/ZMM vector shift by immediate and retaining WriteVecShift as the default (used only by MMX) plus WriteVecShiftX/WriteVecShiftY. X86SchedWriteWidths hides most of this thank goodness. llvm-svn: 331472
* [X86] Split WriteVecALU/WritePHAdd into XMM and YMM/ZMM scheduler classesSimon Pilgrim2018-05-0311-754/+90
| | | | llvm-svn: 331453
* ARM: don't try to over-align large vectors as arguments.Tim Northover2018-05-032-0/+16
| | | | | | | | | | | | By default LLVM thinks very large vectors get aligned to their size when passed across functions. Unfortunately no-one told the ARM backend so it doesn't trigger stack realignment and so accesses can cause the usual misalignment issues (e.g. a data abort). This changes the ABI alignment to the stack alignment, which in practice (and as a bonus) also coincides with the alignment "natural" vectors get. llvm-svn: 331451
* Rename invariant.group.barrier to launder.invariant.groupPiotr Padlewski2018-05-031-2/+2
| | | | | | | | | | | | | | Summary: This is one of the initial commit of "RFC: Devirtualization v2" proposal: https://docs.google.com/document/d/16GVtCpzK8sIHNc2qZz6RN8amICNBtvjWUod2SujZVEo/edit?usp=sharing Reviewers: rsmith, amharc, kuhar, sanjoy Subscribers: arsenm, nhaehnle, javed.absar, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D45111 llvm-svn: 331448
* [X86][AVX512] VPAVG instructions should be tagged as SchedWriteVecALUSimon Pilgrim2018-05-031-1/+1
| | | | llvm-svn: 331446
* [X86] Split WriteVecIMul/WriteVecPMULLD/WriteMPSAD/WritePSADBW into XMM and ↵Simon Pilgrim2018-05-0311-261/+94
| | | | | | | | YMM/ZMM scheduler classes Also retagged VDBPSADBW instructions as SchedWritePSADBW instead of SchedWriteVecIMul which matches the behaviour on SkylakeServer (the only thing that supports it...) llvm-svn: 331445
* [X86] Update MMX instructions to be tagged with X86SchedWriteWidths typesSimon Pilgrim2018-05-032-77/+84
| | | | llvm-svn: 331443
* [TableGen][NFC] Make ResourceCycles definitions more explicit.Clement Courbet2018-05-033-12/+12
| | | | | | https://reviews.llvm.org/D46356 llvm-svn: 331439
* Commit r331416 breaks the big-endian PPC bot. On the big endian build, weNemanja Ivanovic2018-05-031-0/+3
| | | | | | | actually encounter constants wider than 64-bits. Add the guard to prevent tripping the assert. llvm-svn: 331420
* [PowerPC] Implement isMaskAndCmp0FoldingBeneficialNemanja Ivanovic2018-05-022-0/+15
| | | | | | | | | | | Sinking the and closer to a compare against zero is beneficial on PPC as it allows us to emit record-form instructions. In the future, we may expand this to a larger set of operations that feed compares against zero since PPC has lots of record-form instructions. Differential revision: https://reviews.llvm.org/D46060 llvm-svn: 331416
* [PowerPC] No CTR loop if the candidate exiting block is in a different loopNemanja Ivanovic2018-05-021-0/+14
| | | | | | | | | | | | | | | | The CTR loops pass will insert the decrementing branch instruction in an exiting block for the loop being transformed. However if that block is part of another loop as well (whether a nested loop or with irreducible CFG), it is not valid to use that exiting block. In fact, if the loop hass irreducible CFG, we don't bother analyzing it and we just bail on the transformation. In practice, this doesn't lead to a noticeable reduction in the number of loops transformed by this pass. Fixes https://bugs.llvm.org/show_bug.cgi?id=37229 Differential Revision: https://reviews.llvm.org/D46162 llvm-svn: 331410
* [X86][SNB] Fix scheduling of MMX integer multiply instructions.Simon Pilgrim2018-05-021-8/+8
| | | | | | The entries were being bound to the wrong class. llvm-svn: 331388
* [X86] Split WriteShuffle/WriteVarShuffle + WriteBlend/WriteVarBlend into XMM ↵Simon Pilgrim2018-05-0210-136/+75
| | | | | | and YMM/ZMM scheduler classes llvm-svn: 331386
* [COFF, ARM64] Hook up a few remaining relocationsMartin Storsjo2018-05-021-0/+9
| | | | | | Differential Revision: https://reviews.llvm.org/D46355 llvm-svn: 331384
* [AMDGPU] A trivial fix for a buildbot failure caused by "commit ↵Farhana Aleen2018-05-021-1/+1
| | | | | | | 224a839fcbbead221f872cd32a1dd0c308d37299". Author: FarhanaAleen llvm-svn: 331383
* [X86] Cleanup WriteFShuffle/WriteFVarShuffle (+256 variants) scheduler ↵Simon Pilgrim2018-05-025-252/+54
| | | | | | classes with more common default values llvm-svn: 331380
* Revert "[AMDGPU] performAddCombine should run after DAG is legalized."Farhana Aleen2018-05-021-1/+1
| | | | | | This reverts commit 6b97d2995566b4dddd6bf0d75579ff44501d4494. llvm-svn: 331371
* [X86] Convert most remaining XOP uses of X86SchedWritePair scheduler classes ↵Simon Pilgrim2018-05-021-88/+102
| | | | | | to X86SchedWriteWidths. llvm-svn: 331369
* [AMDGPU] performAddCombine should run after DAG is legalized.Farhana Aleen2018-05-021-1/+1
| | | | | | | | | | | | | | | | Summary: performAddCombine should run after DAG is legalized; Otherwise generic optimization in the DAGCombiner can optimize an addcarry+trunc into an addcarry instruction with illegal types. Author: FarhanaAleen Reviewed By: rampitec Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D46337 llvm-svn: 331368
* Fix line-endings. NFCI.Simon Pilgrim2018-05-021-3/+3
| | | | llvm-svn: 331367
* Re-land rL331357 "[X86] Fix scheduling info for VMPSADBWYrmi."Clement Courbet2018-05-021-1/+1
| | | | | | | | Without the rebase mess. https://reviews.llvm.org/D46356 llvm-svn: 331362
* [X86] Cleanup WriteFMul scheduler classes with more common default valuesSimon Pilgrim2018-05-023-70/+14
| | | | | | Intel models were targeting x87 instead of packed sse. llvm-svn: 331360
* Revert rL331355 "[X86] Fix scheduling info for VMPSADBWYrmi."Clement Courbet2018-05-021-16/+5
| | | | | | It contains unrelated changes. llvm-svn: 331357
OpenPOWER on IntegriCloud