summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Remove unnecessary WriteFVarBlend/WriteVarBlend InstRW overrides.Simon Pilgrim2018-04-225-117/+17
| | | | | | This also fixes some of the ReadAfterLd issues due to InstRW. llvm-svn: 330544
* [X86] Fix WriteMPSAD/WritePSADBW values to allow us to remove unnecessary ↵Simon Pilgrim2018-04-225-71/+8
| | | | | | instrw overrides. llvm-svn: 330542
* [X86][SandyBridge] Remove unnecessary WritePOPCNTLd overrides by fixing load ↵Simon Pilgrim2018-04-221-2/+1
| | | | | | latency. llvm-svn: 330541
* [X86] Change TB to PS on LFENCE instruction.Craig Topper2018-04-221-1/+1
| | | | | | This matches the other FENCE instructions. llvm-svn: 330533
* [X86] Remove OpSizeIgnore, it's not implemented any differently than ↵Craig Topper2018-04-223-7/+3
| | | | | | OpSizeFixed. llvm-svn: 330532
* [X86] Remove DATA32_PREFIX. Hack the printing for DATA16_PREFIX to print ↵Craig Topper2018-04-224-12/+25
| | | | | | | | | | 'data32' in 16-bit mode. Hack the asm parser to convert 'data32' to 'data16' in 16-bit mode. Improve the error messages to match GNU assembler. This also allows us to remove the hack from the disassembler table building. llvm-svn: 330531
* [X86] Strip unnecessary prefetch + vector move/load instrw overrides from ↵Simon Pilgrim2018-04-215-143/+6
| | | | | | scheduler models. llvm-svn: 330527
* [X86] Strip unnecessary WriteCvtF2I instrw overrides from scheduler models.Simon Pilgrim2018-04-212-6/+2
| | | | llvm-svn: 330525
* [X86] Strip unnecessary broadcast/shuffle256 instrw overrides from scheduler ↵Simon Pilgrim2018-04-214-133/+5
| | | | | | models. llvm-svn: 330523
* [X86][AVX] VPERM2F128/VINSERTF128 should be a shuffle256 schedule like ↵Simon Pilgrim2018-04-212-4/+6
| | | | | | VPERM2I128/VINSERTI128 llvm-svn: 330522
* [X86] Strip unnecessary vector integer math, shift-imm, extend, shuffle, ↵Simon Pilgrim2018-04-214-398/+12
| | | | | | pack/unpack instruction instrw overrides from scheduler models. llvm-svn: 330521
* [X86] Add DAG combine to turn (trunc (srl (mul ext, ext), 16) into ↵Craig Topper2018-04-211-0/+57
| | | | | | | | PMULHW/PMULHUW. Ultimately I want to use this to remove the intrinsics for these instructions. llvm-svn: 330520
* [X86] Add SchedWrites for LDMXCSR/STMXCSR.Craig Topper2018-04-2111-58/+53
| | | | llvm-svn: 330517
* [X86][Haswell] Strip unnecessary WriteFAdd/WriteFHAdd instruction instrw ↵Simon Pilgrim2018-04-211-16/+2
| | | | | | overrides. llvm-svn: 330514
* [X86][Broadwell] Remove unnecessary VORPD/VORPS instrw override - missed in ↵Simon Pilgrim2018-04-211-2/+0
| | | | | | D45629 llvm-svn: 330513
* [X86] Strip unnecessary WriteFRcp/WriteFRsqrt instruction instrw overrides ↵Simon Pilgrim2018-04-214-42/+8
| | | | | | | | from scheduler models. The required the default skylake schedules to be updated - these were being completely overriden by the InstRW and the existing values not used at all. llvm-svn: 330510
* [X86] Strip unnecessary WriteFShuffle instruction instrw overrides from ↵Simon Pilgrim2018-04-214-143/+6
| | | | | | scheduler models. llvm-svn: 330508
* [X86][SandyBridge] Strip unnecessary MOVQ/CVT instruction instrw overrides.Simon Pilgrim2018-04-211-9/+3
| | | | llvm-svn: 330505
* [X86] Strip unnecessary MMX instruction instrw overrides from scheduler models.Simon Pilgrim2018-04-216-181/+9
| | | | llvm-svn: 330503
* [X86] Strip unnecessary x87 instruction instrw overrides from scheduler models.Simon Pilgrim2018-04-213-52/+4
| | | | llvm-svn: 330501
* [PowerPC] fix incorrect vectorization of abs() on POWER9Hiroshi Inoue2018-04-212-14/+95
| | | | | | | | | | | | | | | | | | | | Vectorized loops with abs() returns incorrect results on POWER9. This patch fixes it. For example the following code returns negative result if input values are negative though it sums up the absolute value of the inputs. int vpx_satd_c(const int16_t *coeff, int length) { int satd = 0; for (int i = 0; i < length; ++i) satd += abs(coeff[i]); return satd; } This problem causes test failures for libvpx. For vector absolute and vector absolute difference on POWER9, LLVM generates VABSDUW (Vector Absolute Difference Unsigned Word) instruction or variants. Since these instructions are for unsigned integers, we need adjustment for signed integers. For abs(sub(a, b)), we generate VABSDUW(a+0x80000000, b+0x80000000). Otherwise, abs(sub(-1, 0)) returns 0xFFFFFFFF(=-1) instead of 1. For abs(a), we generate VABSDUW(a+0x80000000, 0x80000000). Differential Revision: https://reviews.llvm.org/D45522 llvm-svn: 330497
* [X86] Add WriteFSign/WriteFLogic scheduler classesSimon Pilgrim2018-04-2013-312/+59
| | | | | | | | | | | | | | Split the fp and integer vector logical instruction scheduler classes - older CPUs especially often handled these on different pipes. This unearthed a couple of things that are also handled in this patch: (1) We were tagging avx512 fp logic ops as WriteFAdd, probably because of the lack of WriteFLogic (2) SandyBridge had integer logic ops only using Port5, when afaict they can use Ports015. (3) Cleaned up x86 FCHS/FABS scheduling as they are typically treated as fp logic ops. Differential Revision: https://reviews.llvm.org/D45629 llvm-svn: 330480
* [Hexagon] hexagon-autohvx was left on againKrzysztof Parzyszek2018-04-201-1/+1
| | | | llvm-svn: 330472
* [Hexagon] Improve HVX instruction selection (bitcast, vsplat)Krzysztof Parzyszek2018-04-208-71/+93
| | | | | | | | | | There was some unfortunate interaction between VSPLAT and BITCAST related to the selection of constant vectors (coming from selecting shuffles). Introduce VSPLATW that always splats a 32-bit word, and can have arbitrary result type (to avoid BITCASTs of VSPLAT). Clean up the previous selection of BITCAST/VSPLAT. llvm-svn: 330471
* [Hexagon] Skip fixed-stack indexes in HexagonConstExtendersKrzysztof Parzyszek2018-04-201-0/+7
| | | | | | | Fixed slots have negative values, and TRI::stackSlot2Index and TRI::index2StackSlot do not handle negative numbers. llvm-svn: 330468
* [X86][SandyBridge] Remove duplciate InstRWs from Sandy Brige scheduler model.Craig Topper2018-04-201-60/+6
| | | | llvm-svn: 330465
* [X86] WaitPKG instructionsGabor Buella2018-04-208-13/+83
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Three new instructions: umonitor - Sets up a linear address range to be monitored by hardware and activates the monitor. The address range should be a writeback memory caching type. umwait - A hint that allows the processor to stop instruction execution and enter an implementation-dependent optimized state until occurrence of a class of events. tpause - Directs the processor to enter an implementation-dependent optimized state until the TSC reaches the value in EDX:EAX. Also modifying the description of the mfence instruction, as the rep prefix (0xF3) was allowed before, which would conflict with umonitor during disassembly. Before: $ echo 0xf3,0x0f,0xae,0xf0 | llvm-mc -disassemble .text mfence After: $ echo 0xf3,0x0f,0xae,0xf0 | llvm-mc -disassemble .text umonitor %rax Reviewers: craig.topper, zvi Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D45253 llvm-svn: 330462
* [MachineOutliner] Change B instruction for tail calls to TCRETURNdiJessica Paquette2018-04-201-2/+3
| | | | | | | | | | First off, this is more correct than having the B. Second off, this was making a bot upset. This fixes that. Update the test to include -verify-machineinstrs as well to prevent stuff like this slipping by non debug/assert builds in the future. llvm-svn: 330459
* [X86][BtVer2] Cleanup some old FIXMEs from the model. NFCI.Simon Pilgrim2018-04-201-5/+2
| | | | llvm-svn: 330428
* [X86] Tag CLDEMOTE instruction with WriteLoad scheduling classSimon Pilgrim2018-04-201-1/+2
| | | | | | Same as other cacheline instructions llvm-svn: 330424
* [AArch64][SVE] Asm: Support for contiguous LD1 (scalar+scalar) load ↵Sander de Smalen2018-04-203-2/+53
| | | | | | | | | | | | | | | | | | | instructions. This is patch [4/4] in a series to add assembler/disassembler support for SVE's contiguous LD1 (scalar+scalar) instructions: - Patch [1/4]: https://reviews.llvm.org/D45687 - Patch [2/4]: https://reviews.llvm.org/D45688 - Patch [3/4]: https://reviews.llvm.org/D45689 - Patch [4/4]: https://reviews.llvm.org/D45690 Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D45690 llvm-svn: 330423
* [AArch64][SVE] Fix diagnostic for SVE LD4 instructions:Sander de Smalen2018-04-201-1/+1
| | | | | | | | | | Diagnostic: 'index must be multiple of 3 in range [-32, 28]' Must be: 'index must be multiple of 4 in range [-32, 28]' llvm-svn: 330407
* [AArch64][SVE] Added GPR64shifted and GPR64NoXZRshifted register classes.Sander de Smalen2018-04-205-7/+88
| | | | | | | | | | | | | | | | | | | | Summary: This is patch [3/4] in a series to add assembler/disassembler support for SVE's contiguous LD1 (scalar+scalar) instructions: - Patch [1/4]: https://reviews.llvm.org/D45687 - Patch [2/4]: https://reviews.llvm.org/D45688 - Patch [3/4]: https://reviews.llvm.org/D45689 - Patch [4/4]: https://reviews.llvm.org/D45690 Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro Reviewed By: SjoerdMeijer Subscribers: tschuett, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45689 llvm-svn: 330406
* Revert "This pass, fixing an erratum in some LEON 2 processors..."Daniel Cederman2018-04-205-18/+1
| | | | | | | | | | | | | | | | | | | | | Summary: Reading Atmel's AT697E errata document this does not seem like a valid workaround. While the text only mentions SDIV, it says that the ICC flags can be wrong, and those are only generated by SDIVcc. Verification on hardware shows that simply replacing SDIV with SDIVcc does not avoid the bug with negative operands. This reverts r283727. Reviewers: lero_chris, jyknight Reviewed By: jyknight Subscribers: fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D45813 llvm-svn: 330397
* [Sparc] Use synthetic instruction clr to zero register instead of sethiDaniel Cederman2018-04-201-0/+3
| | | | | | | | | | | | | | | Using `clr reg`/`mov %g0, reg`/`or %g0, %g0, reg` to zero a register looks much better than `sethi 0, reg`. Reviewers: jyknight, venkatra Reviewed By: jyknight Subscribers: eraman, fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D45810 llvm-svn: 330396
* [AArch64][AsmParser] Extend RegOp with integrated 'shift/extend'.Sander de Smalen2018-04-202-36/+113
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: In some cases the shift/extend needs to be explicitly parsed together with the register, rather than as a separate operand. This is needed for addressing modes where the instruction as a whole dictates the scaling/extend, rather than specific bits in the instruction. By parsing them as a single operand, we avoid the need to pass an extra operand in all CodeGen patterns (because all operands need to have an associated value), and we avoid the need to update TableGen to accept operands that have no associated bits in the instruction. An added benefit of parsing them together is that the assembler can give a sensible diagnostic if the scaling is not correct. This is patch [2/4] in a series to add assembler/disassembler support for SVE's contiguous LD1 (scalar+scalar) instructions: - Patch [1/4]: https://reviews.llvm.org/D45687 - Patch [2/4]: https://reviews.llvm.org/D45688 - Patch [3/4]: https://reviews.llvm.org/D45689 - Patch [4/4]: https://reviews.llvm.org/D45690 Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro Reviewed By: fhahn, SjoerdMeijer Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45688 llvm-svn: 330394
* AMDGPU: Legalize the operand of SI_INIT_M0Nicolai Haehnle2018-04-201-0/+15
| | | | | | | | | | | | | | | | | | | | Summary: This fixes a case where the argument to a sendmsg intrinsic ends up in a VGPR, for whatever reason. The underlying performance issue is that a multiplication that can be an s_mul_i32 is instead needlessly generated as v_mul_u32_u24, but this is not addressed by this patch. Change-Id: I61fd4034314d5acdf6074632c30b65364dfa7328 Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45826 llvm-svn: 330393
* [Sparc] Fix addressing mode when using 64-bit values in inline assemblyDaniel Cederman2018-04-201-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: If a 64-bit register is used as an operand in inline assembly together with a memory reference, the memory addressing will be wrong. The addressing will be a single reg, instead of reg+reg or reg+imm. This will generate a bad offset value or an exception in printMemOperand(). For example: ``` long long int val = 5; long long int mem; __asm__ volatile ("std %1, %0":"=m"(mem):"r"(val)); ``` becomes: ``` std %i0, [%i2+589833] ``` The problem is that SelectInlineAsmMemoryOperand() is never called for the memory references if one of the operands is a 64-bit register. By calling SelectInlineAsmMemoryOperands() in tryInlineAsm() the Sparc version of SelectInlineAsmMemoryOperand() gets called for each memory reference. Reviewers: jyknight, venkatra Reviewed By: jyknight Subscribers: eraman, fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D45761 llvm-svn: 330392
* [AMDGPU] Use packed literals with zero either lower or hi partStanislav Mekhanoshin2018-04-192-2/+21
| | | | | | Differential Revision: https://reviews.llvm.org/D45790 llvm-svn: 330365
* [X86] Enable popcnt false dependency breaking on Silvermont and Goldmont.Craig Topper2018-04-191-2/+6
| | | | | | Silvermont and Goldmont have the same issue on popcnt as Sandy Bridge, Haswell, Broadwell, and Skylake. Believe it is fixed in Goldmont Plus. llvm-svn: 330358
* [X86][SLM] Fix typo using SandyBridge resources. Simon Pilgrim2018-04-191-2/+2
| | | | | | Luckily this was on instructions not supported on Silvermont.... llvm-svn: 330351
* [X86] Correct the scheduling data for register forms of XCHG and XADD on ↵Craig Topper2018-04-195-22/+24
| | | | | | | | | | | | Intel CPUs. The XCHG16rr/XCHG32rr/XCHG64rr instructions should be 3 uops just like XCHG8rr. I believe they're just implemented as 3 move uops with a temporary register. XADD is probably 2 moves and an add also using a temporary register. Change the latency for both from 2 cycles to 3 cycles. Only 2 of the uops are serialized in their execution, the move into the temporary and the move out of the temporary. The move from one GPR to the other should be able to go in parallel with this if there are ALU resources available. llvm-svn: 330349
* [X86] Merge some MMX instregexSimon Pilgrim2018-04-195-269/+88
| | | | | | There's a lot more but I'd prefer focussing on removing unnecessary InstRWs first. llvm-svn: 330347
* [Hexagon] Use legal types when lowering CONCAT_VECTORS via BUILD_VECTORKrzysztof Parzyszek2018-04-191-0/+26
| | | | llvm-svn: 330344
* [AMDGPU] Do not only rely on BB number when finding bottom loopMark Searles2018-04-191-20/+45
| | | | | | | | We should also check that the "bottom" basic block of a loopis a successor of the "header" basic block, otherwise we don't propagate the information correctly when the CFG is complex. This fixes an important rendering problem with Wolfsentein 2, because of one vector-memory wait was missing. Differential Revision: https://reviews.llvm.org/D43831 llvm-svn: 330337
* [Hexagon] Generate code for vector bswap intrinsicsKrzysztof Parzyszek2018-04-191-0/+5
| | | | llvm-svn: 330333
* [X86][BtVer2] Remove SSE4A EXTRQ/EXTRQI InstRW overrides.Simon Pilgrim2018-04-191-4/+0
| | | | | | These are already handled identically by WriteALU. llvm-svn: 330332
* [Hexagon] Add/fix patterns for 32/64-bit vector compares and logical opsKrzysztof Parzyszek2018-04-192-99/+89
| | | | llvm-svn: 330330
* [mips] Correct the definitions of the unaligned word memory operation ↵Simon Dardis2018-04-194-25/+40
| | | | | | | | | | | | | | | | instructions These instructions lacked the correct predicates, were not marked as loads and stores and lacked the proper instruction mapping information. In the case of microMIPS sw(l|r)e (EVA) these instructions were using the load EVA description. Reviewers: abeserminji, smaksimovic, atanasyan Differential Revision: https://reviews.llvm.org/D45626 llvm-svn: 330326
* Lowering x86 adds/addus/subs/subus intrinsics (llvm part)Alexander Ivchenko2018-04-192-40/+89
| | | | | | | | | | | | | This is the patch that lowers x86 intrinsics to native IR in order to enable optimizations. The patch also includes folding of previously missing saturation patterns so that IR emits the same machine instructions as the intrinsics. Patch by tkrupa Differential Revision: https://reviews.llvm.org/D44785 llvm-svn: 330322
OpenPOWER on IntegriCloud