summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [X86][NFC] Fix harmless typos in BDW/ZnVer1 sched models.Clement Courbet2018-06-072-11/+11
| | | | | | See D46356 for context. llvm-svn: 334164
* [SystemZ] Build Load And Test from scratch in convertToLoadAndTest.Jonas Paulsson2018-06-071-10/+16
| | | | | | | | | | This is needed to get CC operand in right place, as expected by the SchedModel. Review: Ulrich Weigand https://reviews.llvm.org/D47820 llvm-svn: 334161
* [AMDGPU] Improve reciprocal handlingStanislav Mekhanoshin2018-06-061-7/+13
| | | | | | | | | | | | | | | | | | | | | | | When denormals are supported we are producing a full division for 1.0f / x. That still can be replaced by the faster version: bool c = fabs(x) > 0x1.0p+96f; float s = c ? 0x1.0p-32f : 1.0f; x *= s; return s * v_rcp_f32(x) in case if requested accuracy is 2.5ulp or less. The same version is used if denormals are not supported for non 1.0 numerators, where just v_rcp_f32 is then used for 1.0 numerator. The optimization of 1/x is extended to the case -1/x, which is the same except for the resulting sign bit. OpenCL conformance passed with both enabled and disabled denorms. Differential Revision: https://reviews.llvm.org/D47805 llvm-svn: 334142
* AMDGPU: Custom lower v2f16 fneg/fabs with illegal f16Matt Arsenault2018-06-062-0/+30
| | | | | | | | | | | | Fixes terrible code on targets without f16 support. The legalization creates a mess that is difficult to recover from. Also should avoid randomly breaking these tests multiple times in sequence in future commits. Some regressions in cases where it happens to be better to pull the source modifier after the conversion. llvm-svn: 334132
* [X86] Emit BZHI when mask is ~(-1 << nbits))Roman Lebedev2018-06-061-13/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: In D47428, i propose to choose the `~(-(1 << nbits))` as the canonical form of low-bit-mask formation. As it is seen from these tests, there is a reason for that. AArch64 currently better handles `~(-(1 << nbits))`, but not the more traditional `(1 << nbits) - 1` (sic!). The other way around for X86. It would be much better to canonicalize. This patch is completely monkey-typing. I don't really understand how this works :) I have based it on `// x & (-1 >> (32 - y))` pattern. Also, when we only have `BMI`, i wonder if we could use `BEXTR` with `start=0` ? Related links: https://bugs.llvm.org/show_bug.cgi?id=36419 https://bugs.llvm.org/show_bug.cgi?id=37603 https://bugs.llvm.org/show_bug.cgi?id=37610 https://rise4fun.com/Alive/idM Reviewers: craig.topper, spatel, RKSimon, javed.absar Reviewed By: craig.topper Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D47453 llvm-svn: 334125
* [Hexagon] Implement vector-pair zero as V6_vsubw_dvKrzysztof Parzyszek2018-06-063-4/+17
| | | | llvm-svn: 334123
* [X86] Properly disassemble gather/scatter instructions where xmm4/ymm4/zmm4 ↵Craig Topper2018-06-061-1/+1
| | | | | | | | | | are used as the index. These encodings correspond to the cases in the normal encoding scheme where there is no index and our modrm reading code initially decodes it as such. The VSIB handling code tried to compensate for this, but failed to add the base needed to make later code do the right thing. Fixes PR37712. llvm-svn: 334121
* [X86] Rename vy512mem->vy512xmem and vz256xmem->vz256mem.Craig Topper2018-06-062-14/+14
| | | | | | | | | | | The index size is represented by the letter after the 'v'. The number represents the memory size. If an 'x' appears after the number its means the index register can be from VR128X/VR256X instead of VR128/VR256. As vy512mem uses a VR256X index it should have an x. And vz256mem uses a VR512 index so it shouldn't have an x. I admit these names kind of suck and are confusing. llvm-svn: 334120
* [X86][BtVer2] Add support for all vector instructions that should match the ↵Simon Pilgrim2018-06-061-4/+32
| | | | | | | | dependency-breaking 'zero-idiom' As detailed on Agner's Microarchitecture doc (21.8 AMD Bobcat and Jaguar pipeline - Dependency-breaking instructions), all these instructions are dependency breaking and zero the destination register. llvm-svn: 334119
* [AArch64, ARM] Add support for Samsung Exynos M4Evandro Menezes2018-06-062-0/+7
| | | | | | Create a separate feature set for Exynos M4 and add test cases. llvm-svn: 334115
* [Hexagon] Split CTPOP of vector pairsKrzysztof Parzyszek2018-06-061-0/+1
| | | | llvm-svn: 334109
* Change TII isCopyInstr way of returning arguments(NFC)Petar Jovanovic2018-06-068-29/+33
| | | | | | | | | | | Make TII isCopyInstr() return MachineOperands through pointer to pointer instead via reference. Patch by Nikola Prica. Differential Revision: https://reviews.llvm.org/D47364 llvm-svn: 334105
* Fix MSVC '*/' found outside of comment warning. NFCI.Simon Pilgrim2018-06-061-1/+1
| | | | llvm-svn: 334086
* Fix compilation of WebAssembly and RISCV after r334078Ilya Biryukov2018-06-062-7/+16
| | | | llvm-svn: 334085
* [X86][BMI][TBM] Only demand bottom 16-bits of the BEXTR control op (PR34042)Simon Pilgrim2018-06-065-41/+86
| | | | | | | | Only the bottom 16-bits of BEXTR's control op are required (0:8 INDEX, 15:8 LENGTH). Differential Revision: https://reviews.llvm.org/D47690 llvm-svn: 334083
* [MC] Pass MCSubtargetInfo to fixupNeedsRelaxation and applyFixupPeter Smith2018-06-0613-51/+100
| | | | | | | | | | | | | | | | | | On targets like Arm some relaxations may only be performed when certain architectural features are available. As functions can be compiled with differing levels of architectural support we must make a judgement on whether we can relax based on the MCSubtargetInfo for the function. This change passes through the MCSubtargetInfo for the function to fixupNeedsRelaxation so that the decision on whether to relax can be made per function. In this patch, only the ARM backend makes use of this information. We must also pass the MCSubtargetInfo to applyFixup because some fixups skip error checking on the assumption that relaxation has occurred, to prevent code-generation errors applyFixup must see the same MCSubtargetInfo as fixupNeedsRelaxation. Differential Revision: https://reviews.llvm.org/D44928 llvm-svn: 334078
* [MIPS GlobalISel] Add lowerCallPetar Jovanovic2018-06-062-1/+133
| | | | | | | | | | | | Add minimal support to lower function calls. Support only functions with arguments/return that go through registers and have type i32. Patch by Petar Avramovic. Differential Revision: https://reviews.llvm.org/D45627 llvm-svn: 334071
* [Mips] Remove uneeded variants of ADDC/ADDE loweringAmaury Sechet2018-06-054-48/+5
| | | | | | | | | | | | Summary: As it turns out, the lowering for the Mips16* family of target is the exact same thing as what the ops expands to, so the code handling them can be removed and the ops only enabled for the MipsSE* family of targets. Reviewers: smaksimovic, atanasyan, abeserminji Subscribers: sdardis, arichardson, llvm-commits Differential Revision: https://reviews.llvm.org/D47703 llvm-svn: 334052
* AMDGPU: Preserve metadata when widening loadsMatt Arsenault2018-06-051-2/+23
| | | | | | | | Preserves the low bound of the !range. I don't think it's legal to do anything with the top half since it's theoretically reading garbage. llvm-svn: 334045
* AMDGPU: Use more custom insert/extract_vector_elt loweringMatt Arsenault2018-06-051-14/+31
| | | | | | Apply to i8 vectors. llvm-svn: 334044
* [Hexagon] Add pattern to generate 64-bit neg instructionKrzysztof Parzyszek2018-06-051-4/+5
| | | | llvm-svn: 334043
* [Hexagon] Add more patterns for generating abs/absp instructionsKrzysztof Parzyszek2018-06-051-5/+15
| | | | llvm-svn: 334038
* [mips] Fix the predicates for arithmetic operationsSimon Dardis2018-06-052-51/+55
| | | | | | | | Reviewers: smaksimovic, atanasyan, abeserminji Differential Revision: https://reviews.llvm.org/D47635 llvm-svn: 334031
* [X86][SSE] Use multiplication scale factors for v8i16 SHL on pre-AVX2 targets.Simon Pilgrim2018-06-051-3/+21
| | | | | | | | | | | | Similar to v4i32 SHL, convert v8i16 shift amounts to scale factors instead to improve performance and reduce instruction count. We were already doing this for constant shifts, this adds variable shift support. Reduces the serial nature of the codegen, which relies on chains of plendvb/pand+pandn+por shifts. This is a step towards adding support for vXi16 vector rotates. Differential Revision: https://reviews.llvm.org/D47546 llvm-svn: 334023
* [MC][X86] Allow assembler variable assignment to register name.Nirav Dave2018-06-052-0/+100
| | | | | | | | | | | | | | | | | | | Summary: Allow extended parsing of variable assembler assignment syntax and modify X86 to permit VAR = register assignment. As we emit these as .set directives when possible, we inline such expressions in output assembly. Fixes PR37425. Reviewers: rnk, void, echristo Reviewed By: rnk Subscribers: nickdesaulniers, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D47545 llvm-svn: 334022
* [X86] NFC Fix typo introduced in r328016 HSI->HDIGabor Buella2018-06-051-1/+1
| | | | llvm-svn: 334016
* [Hexagon] Minor cleanups in isel loweringKrzysztof Parzyszek2018-06-051-9/+8
| | | | llvm-svn: 334015
* [PowerPC] reduce rotate in BitPermutationSelectorHiroshi Inoue2018-06-051-1/+7
| | | | | | | | | | | | | | BitPermutationSelector builds the output value by repeating rotate-and-mask instructions with input registers. Here, we may avoid one rotate instruction if we start building from an input register that does not require rotation. For example of the test case bitfieldinsert.ll, it first rotates left r4 by 8 bits and then inserts some bits from r5 without rotation. This can be executed by one rlwimi instruction, which rotates r4 by 8 bits and inserts its bits into r5. This patch adds a check for rotation amounts in the comparator used in sorting to process the input without rotation first. Differential Revision: https://reviews.llvm.org/D47765 llvm-svn: 334011
* [X86][SSE] Add target shuffle support to ↵Simon Pilgrim2018-06-051-0/+51
| | | | | | | | | | X86TargetLowering::computeKnownBitsForTargetNode Ideally we'd use resolveTargetShuffleInputs to handle faux shuffles as well but: (a) that code path doesn't handle general/pre-legalized ops/types very well. (b) I'm concerned about the compute time as they recurse to calls to computeKnownBits/ComputeNumSignBits which would need depth limiting somehow. llvm-svn: 334007
* [X86] NFC Refactor some code in InstPrintersGabor Buella2018-06-057-261/+199
| | | | | | | | | | | | | | Summary: Bringing some come duplicated in the AT&T and the Intel printers into a common parent class. Reviewers: craig.topper Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D47682 llvm-svn: 334005
* [MC][ARM] Add range checking for Thumb2 resolved fixups.Peter Smith2018-06-051-0/+10
| | | | | | | | | | | | When the branch target of a Thumb2 unconditional or conditonal branch is resolved at assembly time, no range checking is performed on the result leading to incorrect immediates. This change adds a range check: +- 16 Megabytes for unconditional branches, +- 1 Megabyte for the conditional branch. Differential Revision: https://reviews.llvm.org/D46306 llvm-svn: 333997
* [X86][SSE] Add basic PACKUS support to ↵Simon Pilgrim2018-06-051-0/+13
| | | | | | | | X86TargetLowering::computeKnownBitsForTargetNode Helps improve analysis of saturation ops llvm-svn: 333995
* [MC][ARM] Correct Thumb BL instruction rangePeter Smith2018-06-051-3/+5
| | | | | | | | | | | | The Thumb BL range is + or - either 16 Megabytes or 4 Megabytes depending on whether the CPU supports Thumb2 or the v8-m baseline ops. The existing check for BL range is incorrectly set at +- 32 Megabytes. This change corrects the higher range and uses the lower range if the featurebits don't have the necessary support for it. Differential Revision: https://reviews.llvm.org/D46305 llvm-svn: 333991
* [X86][CET] Shadow stack fix for setjmp/longjmpAlexander Ivchenko2018-06-052-5/+257
| | | | | | | | | | | | | | | This is the new version of D46181, allowing setjmp/longjmp to work correctly with the Intel CET shadow stack by storing SSP on setjmp and fixing it on longjmp. The patch has been updated to use the cf-protection-return module flag instead of HasSHSTK, and the bug that caused D46181 to be reverted has been fixed with the test expanded to track that fix. patch by mike.dvoretsky Differential Revision: https://reviews.llvm.org/D47311 llvm-svn: 333990
* [X86] Make all instructions that operate on MMX types, but were added after ↵Craig Topper2018-06-052-18/+18
| | | | | | | | the initial MMX support via one of the SSE features flags make them require the MMX feature as well. Passing -mattr=-mmx needs to disable these instructions since the MMX register class won't have been set up. But we don't want -mattr=-mmx to disable SSE so we have to do it separately. llvm-svn: 333984
* Simplified WebAssemblyAsmBackend by removing explicit ELF variant.Derek Schuff2018-06-041-82/+8
| | | | | | | | | | | | | The ELF version was broken (does not deal with wasm specific fixups), and now is slightly less broken. It will be removed in its entirety in the future which this change makes slightly easier (just remove the IsELF bool). Differential Revision: https://reviews.llvm.org/D47745 Patch by Wouter van Oortmerssen llvm-svn: 333964
* Move Analysis/Utils/Local.h back to TransformsDavid Blaikie2018-06-046-6/+6
| | | | | | | | | | Review feedback from r328165. Split out just the one function from the file that's used by Analysis. (As chandlerc pointed out, the original change only moved the header and not the implementation anyway - which was fine for the one function that was used (since it's a template/inlined in the header) but not in general) llvm-svn: 333954
* [MachineOutliner] NFC - Move intermediate data structures to MachineOutliner.hJessica Paquette2018-06-044-88/+81
| | | | | | | | | | | | | | | | | | | | | This is setting up to fix bug 37573 cleanly. This moves data structures that are technically both used in some way by the target and the general-purpose outlining algorithm into MachineOutliner.h. In particular, the `Candidate` class is of importance. Before, the outliner passed the locations of `Candidates` to the target, which would then make some decisions about the prospective outlined function. This change allows us to just pass `Candidates` along to the target. This will allow the target to discard `Candidates` that would be considered unsafe before cost calculation. Thus, we will be able to remove the unsafe candidates described in the bug without resorting to torching the entire prospective function. Also, as a side-effect, it makes the outliner a bit cleaner. https://bugs.llvm.org/show_bug.cgi?id=37573 llvm-svn: 333952
* [X86][ELF][CET] Adding the .note.gnu.property ELF section in X86Alexander Ivchenko2018-06-041-0/+38
| | | | | | | | | | | | | | In preparation for the proposed linker ABI changes (https://github.com/hjl-tools/linux-abi/wiki/linux-abi-draft.pdf, https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-cet.pdf), this patch enables emission of the .note.gnu.property section to ELF object files when building CET-enabled modules. patch by mike.dvoretsky Differential Revision: https://reviews.llvm.org/D47145 llvm-svn: 333951
* [X86] Don't pass ParitySrc array into isAddSubOrSubAddMask. Instead use a ↵Craig Topper2018-06-041-8/+10
| | | | | | | | bool output parameter to get the real piece of info we care about. NFC The ParitySrc array is more of an implementation detail. A single bool to get the final parity is sufficient. llvm-svn: 333935
* [AMDGPU] Small refactoring in the schedulerStanislav Mekhanoshin2018-06-041-18/+3
| | | | | | | | After last changes some code can be simplified. Differential Revision: https://reviews.llvm.org/D47661 llvm-svn: 333934
* [AMDGPU] Factored out common part of GCNRPTracker::reset()Stanislav Mekhanoshin2018-06-042-11/+17
| | | | | | Differential Revision: https://reviews.llvm.org/D47664 llvm-svn: 333931
* [WebAssembly] Fix .td files after rL333900Sam Clegg2018-06-043-33/+33
| | | | | | Differential Revision: https://reviews.llvm.org/D47727 llvm-svn: 333928
* [AMDGPU][Waitcnt] Fix handling of flat instrsMark Searles2018-06-042-6/+14
| | | | | | | | On GFX9 and earlier, flat memory ops may decrement VMCNT out-of-order as well as LGKMCNT out-of-order. Differential Revision: https://reviews.llvm.org/D46616 llvm-svn: 333926
* [X86] Only accept const SelectionDAG to ↵Simon Pilgrim2018-06-041-2/+2
| | | | | | | | resolveTargetShuffleInputs/getFauxShuffleMask These methods should only be using SelectionDAG for analysis (known/sign bits etc), not node creation. llvm-svn: 333925
* [NVPTX] Delete dead code from the AsmPrinter.Benjamin Kramer2018-06-042-142/+0
| | | | llvm-svn: 333924
* [RFC][patch 3/3] Add support for variant scheduling classes in llvm-mca.Andrea Di Biagio2018-06-042-1/+37
| | | | | | | | | | | | | | | | | | | | | | | | This patch is the last of a sequence of three patches related to LLVM-dev RFC "MC support for variant scheduling classes". http://lists.llvm.org/pipermail/llvm-dev/2018-May/123181.html This fixes PR36672. The main goal of this patch is to teach llvm-mca how to solve variant scheduling classes. This patch does that, plus it adds new variant scheduling classes to the BtVer2 scheduling model to identify so-called zero-idioms (i.e. so-called dependency breaking instructions that are known to generate zero, and that are optimized out in hardware at register renaming stage). Without the BtVer2 change, this patch would not have had any meaningful tests. This patch is effectively the union of two changes: 1) a change that teaches llvm-mca how to resolve variant scheduling classes. 2) a change to the BtVer2 scheduling model that allows us to special-case packed XOR zero-idioms (this partially fixes PR36671). Differential Revision: https://reviews.llvm.org/D47374 llvm-svn: 333909
* AMDGPU: Make various NamedOperands upper caseNicolai Haehnle2018-06-044-43/+43
| | | | | | | | | | | | | | | | Summary: Avoid name clashes with the corresponding bit fields in the instruction encoding. Change-Id: Id1644e703e976e78f7af93788d9f44cb48c3251f Reviewers: arsenm, rampitec, kzhuravl Subscribers: wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47433 llvm-svn: 333905
* TableGen: Streamline the semantics of NAMENicolai Haehnle2018-06-045-284/+298
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The new rules are straightforward. The main rules to keep in mind are: 1. NAME is an implicit template argument of class and multiclass, and will be substituted by the name of the instantiating def/defm. 2. The name of a def/defm in a multiclass must contain a reference to NAME. If such a reference is not present, it is automatically prepended. And for some additional subtleties, consider these: 3. defm with no name generates a unique name but has no special behavior otherwise. 4. def with no name generates an anonymous record, whose name is unique but undefined. In particular, the name won't contain a reference to NAME. Keeping rules 1&2 in mind should allow a predictable behavior of name resolution that is simple to follow. The old "rules" were rather surprising: sometimes (but not always), NAME would correspond to the name of the toplevel defm. They were also plain bonkers when you pushed them to their limits, as the old version of the TableGen test case shows. Having NAME correspond to the name of the toplevel defm introduces "spooky action at a distance" and breaks composability: refactoring the upper layers of a hierarchy of nested multiclass instantiations can cause unexpected breakage by changing the value of NAME at a lower level of the hierarchy. The new rules don't suffer from this problem. Some existing .td files have to be adjusted because they ended up depending on the details of the old implementation. Change-Id: I694095231565b30f563e6fd0417b41ee01a12589 Reviewers: tra, simon_tatham, craig.topper, MartinO, arsenm, javed.absar Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D47430 llvm-svn: 333900
* [mips] Restore the availablity of trap for microMIPSSimon Dardis2018-06-041-0/+1
| | | | | | | | Reviewers: smaksimovic, atanasyan, abeserminji Differential Revision: https://reviews.llvm.org/D47584 llvm-svn: 333895
OpenPOWER on IntegriCloud