summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Add phony registers for high halves of regs with low halvesKrzysztof Parzyszek2018-07-022-37/+74
| | | | | | | | | | Add registers still missing after r328016 (D43353): - for bits 15-8 of SI, DI, BP, SP (*H), and R8-R15 (*BH), - for bits 31-16 of R8-R15 (*WH). Thanks to Craig Topper for pointing it out. llvm-svn: 336134
* [X86] Don't use aligned load/store instructions for fp128 if the load/store ↵Craig Topper2018-07-022-5/+18
| | | | | | | | | | isn't aligned. Similarily, don't fold fp128 loads into SSE instructions if the load isn't aligned. Unless we're targeting an AMD CPU that doesn't check alignment on arithmetic instructions. Should fix PR38001 llvm-svn: 336121
* [AArch64][GlobalISel] Any-extend vararg parameters to stack slot size on Darwin.Amara Emerson2018-07-021-0/+3
| | | | | | | | | | | We currently don't any-extend vararg parameters before storing them to the stack locations on Darwin. However, SelectionDAG however does this, and so user code is in the wild which inadvertently relies on this extension. This can manifest in cases where the value stored is (int)0, but the actual parameter is interpreted by va_arg as a pointer, and so not extending to 64 bits causes the callee to load additional undefined bits. llvm-svn: 336120
* [X86][SSE] Blend any v8i16/v4i32 shift with 2 shift unique valuesSimon Pilgrim2018-07-021-47/+22
| | | | | | We were only doing this for basic blends, despite shuffle lowering now being good enough to handle more complex blends. This means that the two v8i16 splat shifts are performed in parallel instead of serially as the general shift case. llvm-svn: 336113
* [X86] Use addAliasForDirective to support the .word directive (reland)Alex Bradbury2018-07-021-25/+3
| | | | | | | | | | | | | | | | The X86 asm parser currently has custom parsing logic for .word. Rather than use this custom logic, we can just use addAliasForDirective to enable the reuse of AsmParser::parseDirectiveValue. See also similar changes to Sparc (rL333078), AArch64 (rL333077), and Hexagon (rL332607) backends. Differential Revision: https://reviews.llvm.org/D47004 This is a fixed reland of rL336100. This should have been caught in pre-commit testing so apologies for the noise. llvm-svn: 336104
* Revert r336100Alex Bradbury2018-07-021-3/+25
| | | | | | This was a bad change. .word == 2byte on x86. llvm-svn: 336103
* [X86] Use addAliasForDirective to support the .word directiveAlex Bradbury2018-07-021-25/+3
| | | | | | | | | | | | | The X86 asm parser currently has custom parsing logic for .word. Rather than use this custom logic, we can just use addAliasForDirective to enable the reuse of AsmParser::parseDirectiveValue. See also similar changes to Sparc (rL333078), AArch64 (rL333077), and Hexagon (rL332607) backends. Differential Revision: https://reviews.llvm.org/D47004 llvm-svn: 336100
* [AArch64][SVE] Asm: Support for (SQ)INCP/DECP (scalar, vector)Sander de Smalen2018-07-022-0/+94
| | | | | | | | | | | | | | | | | | | | | Increments/decrements the result with the number of active bits from the predicate. The inc/dec variants added are: - incp x0, p0.h (scalar) - incp z0.h, p0 (vector) The unsigned saturating inc/dec variants added are: - uqincp x0, p0.h (scalar) - uqincp w0, p0.h (scalar, 32bit) - uqincp z0.h, p0 (vector) The signed saturating inc/dec variants added are: - sqincp x0, p0.h (scalar) - sqincp x0, p0.h, w0 (scalar, 32bit) - sqincp z0.h, p0 (vector) llvm-svn: 336091
* [AArch64][SVE] Asm: Support for (saturating) vector INC/DEC instructions.Sander de Smalen2018-07-022-0/+49
| | | | | | | | | | | | | | | | | | | Increment/decrement vector by multiple of predicate constraint element count. The variants added by this patch are: - INCH, INCW, INC and (saturating): - SQINCH, SQINCW, SQINCD - UQINCH, UQINCW, UQINCW - SQDECH, SQINCW, SQINCD - UQDECH, UQINCW, UQINCW For example: incw z0.s, all, mul #4 llvm-svn: 336090
* [X86][BtVer2] Added Jaguar FPU Pipe0/1 uop counters to permit basic ↵Simon Pilgrim2018-07-021-0/+2
| | | | | | | | llvm-exegesis uop testing We don't have PMCs to cover many of the Jaguar resources but we can at least monitor the FPU issue pipes which give an indication of the fpu uop count, just not the execution resources. llvm-svn: 336089
* [Mips][FastISel] Do not duplicate condition while lowering branchesPetar Jovanovic2018-07-021-4/+1
| | | | | | | | | | | | | | | | This change fixes the issue that arises when we duplicate condition from the predecessor block. If the condition's arguments are not considered alive across the blocks, fast regalloc gets confused and starts generating reloads from the slots that have never been spilled to. This change also leads to smaller code given that, unlike on architectures with condition codes, on Mips we can branch directly on register value, thus we gain nothing by duplication. Patch by Dragan Mladjenovic. Differential Revision: https://reviews.llvm.org/D48642 llvm-svn: 336084
* [AArch64][SVE] Asm: Support for vector element compares (immediate).Sander de Smalen2018-07-022-0/+85
| | | | | | | | Compare vector elements with a signed/unsigned immediate, e.g. cmpgt p0.s, p0/z, z0.s, #-16 cmphi p0.s, p0/z, z0.s, #127 llvm-svn: 336081
* Reapply r334980 and r334983.Sander de Smalen2018-07-026-18/+154
| | | | | | | | | These patches were previously reverted as they led to buildbot time-outs caused by large switch statement in printAliasInstr when using UBSan and O3. The issue has been addressed with a workaround (r335525). llvm-svn: 336079
* [X86] Put some cases in switch statements back on one line to be more ↵Craig Topper2018-07-021-566/+186
| | | | | | | | compact and make it easier to see the similarities. NFC It looks like someone ran clang-format over this entire file which reformatted these switches into a multiline form. But I think the single line form is more useful here. llvm-svn: 336077
* [X86] Remove FMA3Info DenseMap. Break into sorted tables that we can binary ↵Craig Topper2018-07-024-234/+149
| | | | | | | | | | search. I separated out the rounding and broadcast groups into their own tables because it made the ordering in the main table easier. Further splitting of the tables might make it possible to directly index using bits from the TSFlags, but its probably not worth it right now. llvm-svn: 336075
* [PowerPC] Don't make it as pre-inc candidate if displacement isn't 4's ↵QingShan Zhang2018-07-021-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | multiple for i64 pre-inc load/store For the below case, pre-inc prep think it's a good candidate to use pre-inc for the bucket, but 64bit integer load/store update (pre-inc) instruction on Power requires the displacement field should be DS-form (4's multiple). Since it can't satisfy the constraint, we have to do some fix ups later. As below, the original load/stores could be well-form, it makes things worse. unsigned long long result = 0; unsigned long long foo(char *p, unsigned long long n) { for (unsigned long long i = 0; i < n; i++) { unsigned long long x1 = *(unsigned long long *)(p - 50000 + i); unsigned long long x2 = *(unsigned long long *)(p - 61024 + i); unsigned long long x3 = *(unsigned long long *)(p - 62048 + i); unsigned long long x4 = *(unsigned long long *)(p - 64096 + i); result *= x1 * x2 * x3 * x4; } return result; } Patch by jedilyn(Kewen Lin). Differential Revision: https://reviews.llvm.org/D48813 --This line, and those below, will be ignored-- M lib/Target/PowerPC/PPCLoopPreIncPrep.cpp A test/CodeGen/PowerPC/preincprep-i64-check.ll llvm-svn: 336074
* Implement strip.invariant.groupPiotr Padlewski2018-07-021-0/+2
| | | | | | | | | | | | | | | | Summary: This patch introduce new intrinsic - strip.invariant.group that was described in the RFC: Devirtualization v2 Reviewers: rsmith, hfinkel, nlopes, sanjoy, amharc, kuhar Subscribers: arsenm, nhaehnle, JDevlieghere, hiraditya, xbolva00, llvm-commits Differential Revision: https://reviews.llvm.org/D47103 Co-authored-by: Krzysztof Pszeniczny <krzysztof.pszeniczny@gmail.com> llvm-svn: 336073
* [X86] Remove the places that return nullptr from ↵Craig Topper2018-07-011-44/+10
| | | | | | | | X86InstrInfo::commuteInstructionImpl. findCommutedOpIndices does the pre-checking for whether commuting is possible. There should be no reason left to fail in commuteInstructionImpl. There was a missing pre-check that I've added there and changed the check to an assert in commuteInstructionImpl. llvm-svn: 336070
* [X86][Disassembler] Remove TYPE_BNDR from translateImmediate.Craig Topper2018-07-011-2/+0
| | | | | | I've check the disassembler tables and this shouldn't be reachable. Which is good since if it was reachable there should have been a 'return' after the addOperand line. llvm-svn: 336066
* [UnrollAndJam] New Unroll and Jam passDavid Green2018-07-011-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a simple implementation of the unroll-and-jam classical loop optimisation. The basic idea is that we take an outer loop of the form: for i.. ForeBlocks(i) for j.. SubLoopBlocks(i, j) AftBlocks(i) Instead of doing normal inner or outer unrolling, we unroll as follows: for i... i+=2 ForeBlocks(i) ForeBlocks(i+1) for j.. SubLoopBlocks(i, j) SubLoopBlocks(i+1, j) AftBlocks(i) AftBlocks(i+1) Remainder Loop So we have unrolled the outer loop, then jammed the two inner loops into one. This can lead to a simpler inner loop if memory accesses can be shared between the now jammed loops. To do this we have to prove that this is all safe, both for the memory accesses (using dependence analysis) and that ForeBlocks(i+1) can move before AftBlocks(i) and SubLoopBlocks(i, j). Differential Revision: https://reviews.llvm.org/D41953 llvm-svn: 336062
* [X86] Remove unnecessary include. NFCCraig Topper2018-07-011-1/+0
| | | | | | Leftover from when the pass contained a DenseMap before it switched to binary search. llvm-svn: 336057
* [X86] Move the memory unfolding table creation into its own class and make ↵Craig Topper2018-07-015-5499/+5554
| | | | | | | | | | | | it a ManagedStatic. Also move the static folding tables, their search functions and the new class into new cpp/h files. The unfolding table is effectively static data. It's just a different ordering and a subset of the static folding tables. By putting it in a separate ManagedStatic we ensure we only have one copy instead of one per X86InstrInfo object. This way also makes it only get initialized when really needed. llvm-svn: 336056
* [X86] Move the X86InstrFMA3Info class into the cpp file. Expose only a ↵Craig Topper2018-06-303-48/+36
| | | | | | | | | | getFMA3Group free function. NFCI The class only exists to hold a DenseMap and is only created as a ManagedStatic. It used to expose a single static method that outside code was expected to use. This patch moves that static function out of the class and moves it implementation into the cpp file. It can now access the ManagedStatic directly by name without the need for the other static method that accessed the ManagedStatic. llvm-svn: 336055
* [X86] Remove the AsmName from the HAX,HDX,HCX,HBX,HSI,HDI,HBP,HSP,HIP ↵Craig Topper2018-06-301-9/+9
| | | | | | | | artificial registers so they can't be parsed by the assembly parser. There are no instructions that use them so they weren't causing any bad matches. But they weren't being diagnosed as "invalid register name" if they were used and would instead trigger some form of invalid operand. llvm-svn: 336054
* [X86] Use MVT::i8 for scalar shift amounts since that is what they ↵Craig Topper2018-06-301-5/+5
| | | | | | | | | | ultimately need to legalize to. I believe all of these are constants so legalizing them should be pretty trivial, but this saves a step. In one case it looks like we may have been creating a shift amount larger than the shift input itself. llvm-svn: 336052
* [X86] When combining load to BZHI, make sure we create the shift instruction ↵Craig Topper2018-06-301-4/+5
| | | | | | | | | | with an i8 type. This combine runs pretty late and causes us to introduce a shift after the op legalization phase has run. We need to be sure we create the shift with the proper type for the shift amount. If we don't do this, we will still re-legalize the operation properly, but we won't get a chance to fully optimize the truncate that gets inserted. So this patch adds the necessary truncate when the shift is created. I've also narrowed the subtract that gets created to always be an i32 type. The truncate would have trigered SimplifyDemandedBits to optimize it anyway. But using a more appropriate VT here is free and saves an optimization step. llvm-svn: 336051
* AMDGPU/GlobalISel: Make IMPLICIT_DEF of all sizes < 512 legal.Tom Stellard2018-06-301-2/+10
| | | | | | | | | | | | | | | | | | | Summary: We could split sizes that are not power of two into smaller sized G_IMPLICIT_DEF instructions, but this ends up generating G_MERGE_VALUES instructions which we then have to handle in the instruction selector. Since G_IMPLICIT_DEF is really a no-op it's easier just to keep everything that can fit into a register legal. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48777 llvm-svn: 336041
* [X86] Remove masking from avx512 rotate intrinsics. Use select in IR instead.Craig Topper2018-06-302-34/+36
| | | | llvm-svn: 336035
* [WebAssembly] Comment out a switch block in ISelDAGToDAGHeejin Ahn2018-06-291-5/+4
| | | | | | | | | | | | Summary: Fixes PR37977. Reviewers: RKSimon Subscribers: dschuff, sbc100, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D48737 llvm-svn: 336017
* AMDGPU: Don't use struct type for argument layoutMatt Arsenault2018-06-294-50/+71
| | | | | | | | | | This was introducing unnecessary padding after the explicit arguments, depending on the alignment of the total struct type. Also has the side effect of avoiding creating an extra GEP for the offset from the base kernel argument to the explicit kernel argument offset. llvm-svn: 335999
* [X86] Limit the number of target specific nodes emitted in LowerShiftPartsCraig Topper2018-06-291-12/+7
| | | | | | | | | | The important part is the creation of the SHLD/SHRD nodes. The compare and the conditional move can use target independent nodes that can be legalized on their own. This gives some opportunities to trigger the optimizations present in the lowering for those things. And its just better to limit the number of places we emit target specific nodes. The changed test cases still aren't optimal. Differential Revision: https://reviews.llvm.org/D48619 llvm-svn: 335998
* [X86] Use a std::vector for the memory unfolding table.Craig Topper2018-06-292-26/+56
| | | | | | | | | | Previously we used a DenseMap which is costly to set up due to multiple full table rehashes as the size increases and causes the table to be reallocated. This patch changes the table to a vector of structs. We now walk the reg->mem tables and push new entries in the mem->reg table for each row not marked TB_NO_REVERSE. Once all the table entries have been created, we sort the vector. Then we can use a binary search for lookups. Differential Revision: https://reviews.llvm.org/D48585 llvm-svn: 335994
* [mips] Support shrink-wrappingPetar Jovanovic2018-06-293-11/+11
| | | | | | | | | | Except for -O0, it's enabled by default. Patch by Vladimir Stefanovic. Differential Revision: https://reviews.llvm.org/D47947 llvm-svn: 335989
* [AMDGPU] Enable LICM in the BE pipelineStanislav Mekhanoshin2018-06-291-0/+1
| | | | | | | | | | This allows to hoist code portion to compute reciprocal of loop invariant denominator in integer division after codegen prepare expansion. Differential Revision: https://reviews.llvm.org/D48604 llvm-svn: 335988
* [Hexagon] Remove unused instruction itineraties, NFCKrzysztof Parzyszek2018-06-291-680/+0
| | | | llvm-svn: 335975
* [AArch64] Armv8.4-A: Virtualization system registersSjoerd Meijer2018-06-291-0/+22
| | | | | | | | This adds the Secure EL2 extension. Differential Revision: https://reviews.llvm.org/D48711 llvm-svn: 335962
* [X86][SSE] Support v16i8/v32i8 vector rotationsSimon Pilgrim2018-06-291-8/+77
| | | | | | | | | | This uses the same technique as for shifts - split the rotation into 4/2/1-bit partial rotations and select those partials based on the amount bit, making use of PBLENDVB if available. This halves the use of PBLENDVB compared to expanding to shifts, which can be a slow op. Unfortunately I haven't found a decent way to share much of this code with the shift equivalent. Differential Revision: https://reviews.llvm.org/D48655 llvm-svn: 335957
* [ARM][AArch64] Armv8.4-A EnablementSjoerd Meijer2018-06-296-2/+86
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Initial patch adding assembly support for Armv8.4-A. Besides adding v8.4 as a supported architecture to the usual places, this also adds target features for the different crypto algorithms. Armv8.4-A introduced new crypto algorithms, made them optional, and allows different combinations: - none of the v8.4 crypto functions are supported, which is independent of the implementation of the Armv8.0 SHA1 and SHA2 instructions. - the v8.4 SHA512 and SHA3 support is implemented, in this case the Armv8.0 SHA1 and SHA2 instructions must also be implemented. - the v8.4 SM3 and SM4 support is implemented, which is independent of the implementation of the Armv8.0 SHA1 and SHA2 instructions. - all of the v8.4 crypto functions are supported, in this case the Armv8.0 SHA1 and SHA2 instructions must also be implemented. The v8.4 crypto instructions are added to AArch64 only, and not AArch32, and are made optional extensions to Armv8.2-A. The user-facing Clang options will map on these new target features, their naming will be compatible with GCC and added in follow-up patches. The Armv8.4-A instruction sets can be downloaded here: https://developer.arm.com/products/architecture/a-profile/exploration-tools Differential Revision: https://reviews.llvm.org/D48625 llvm-svn: 335953
* [X86] Remove masking from the avx512 packed sqrt intrinsics. Use select in ↵Craig Topper2018-06-292-6/+31
| | | | | | | | IR instead. While there improve the coverage of the intrinsic testing and add fast-isel tests. llvm-svn: 335944
* AMDGPU: Separate R600 and GCN TableGen filesTom Stellard2018-06-2863-1508/+1854
| | | | | | | | | | | | | | | | | | | | | Summary: We now have two sets of generated TableGen files, one for R600 and one for GCN, so each sub-target now has its own tables of instructions, registers, ISel patterns, etc. This should help reduce compile time since each sub-target now only has to consider information that is specific to itself. This will also help prevent the R600 sub-target from slowing down new features for GCN, like disassembler support, GlobalISel, etc. Reviewers: arsenm, nhaehnle, jvesely Reviewed By: arsenm Subscribers: MatzeB, kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46365 llvm-svn: 335942
* [ARM] Assert that ARMDAGToDAGISel creates valid UBFX/SBFX nodes.Eli Friedman2018-06-281-0/+4
| | | | | | | | | | | | We don't ever check these again (unless you're using -fno-integrated-as), so make sure the extracted bits are well-defined. I don't think it's possible to trigger any of the assertions on trunk, but it's difficult to prove. (The first one depends on DAGCombine to minimize the number of set bits in AND masks; I think the others are mathematically impossible to hit.) llvm-svn: 335931
* [NVPTX] Delete dead codeBenjamin Kramer2018-06-285-63/+0
| | | | | | No functionality change. llvm-svn: 335913
* [ARM] Add missing Thumb2 assembler diagnostics.Eli Friedman2018-06-281-62/+116
| | | | | | | | | | Mostly just adding checks for Thumb2 instructions which correspond to ARM instructions which already had diagnostics. While I'm here, also fix ARM-mode strd to check the input registers correctly. Differential Revision: https://reviews.llvm.org/D48610 llvm-svn: 335909
* Remove unnecessary semicolon. NFCI.Simon Pilgrim2018-06-281-2/+2
| | | | | | Fixes -Wpedantic warning. llvm-svn: 335901
* [X86] Suppress load folding into and/or/xor if it will prevent matching ↵Craig Topper2018-06-281-0/+29
| | | | | | | | | | btr/bts/btc. This is a follow up to r335753. At the time I forgot about isProfitableToFold which makes this pretty easy. Differential Revision: https://reviews.llvm.org/D48706 llvm-svn: 335895
* Revert "Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC ↵Jonas Devlieghere2018-06-286-121/+30
| | | | | | | | | | | | | code models"" Reverting because this is causing failures in the LLDB test suite on GreenDragon. LLVM ERROR: unsupported relocation with subtraction expression, symbol '__GLOBAL_OFFSET_TABLE_' can not be undefined in a subtraction expression llvm-svn: 335894
* [MachineOutliner] Define MachineOutliner support in TargetOptionsJessica Paquette2018-06-284-5/+7
| | | | | | | | | | | | | | | Targets should be able to define whether or not they support the outliner without the outliner being added to the pass pipeline. Before this, the outliner pass would be added, and ask the target whether or not it supports the outliner. After this, it's possible to query the target in TargetPassConfig, before the outliner pass is created. This ensures that passing -enable-machine-outliner will not modify the pass pipeline of any target that does not support it. https://reviews.llvm.org/D48683 llvm-svn: 335887
* [WebAssembly] Add getSetCCResultType placeholder override to handle vector ↵Simon Pilgrim2018-06-282-0/+12
| | | | | | | | compare results. Necessary to get the rL335821 bugfix (which was reverted at rL335871) un-reverted. llvm-svn: 335884
* SelectionDAGBuilder, mach-o: Skip trap after noreturn call (for Mach-O)Matthias Braun2018-06-283-3/+9
| | | | | | | | | | | | | Add NoTrapAfterNoreturn target option which skips emission of traps behind noreturn calls even if TrapUnreachable is enabled. Enable the feature on Mach-O to save code size; Comments suggest it is not possible to enable it for the other users of TrapUnreachable. rdar://41530228 DifferentialRevision: https://reviews.llvm.org/D48674 llvm-svn: 335877
* [AMDGPU] Early expansion of 32 bit udiv/uremStanislav Mekhanoshin2018-06-281-4/+316
| | | | | | | | | | | | This allows hoisting of a common code, for instance if denominator is loop invariant. Current change is expansion only, adding licm to the target pass list going to be a separate patch. Given this patch changes to codegen are minor as the expansion is similar to that on DAG. DAG expansion still must remain for R600. Differential Revision: https://reviews.llvm.org/D48586 llvm-svn: 335868
OpenPOWER on IntegriCloud