summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [AMDGPU] DAG combine to produce V_PERM_B32Stanislav Mekhanoshin2018-06-125-1/+214
| | | | | | Differential Revision: https://reviews.llvm.org/D48099 llvm-svn: 334559
* [DAGCombiner] Recognize more patterns for ABSKrzysztof Parzyszek2018-06-123-28/+11
| | | | | | Differential Revision: https://reviews.llvm.org/D47831 llvm-svn: 334553
* [AArch64] Support reserving x20 registerPetr Hosek2018-06-124-5/+25
| | | | | | | | | | | | Register x20 is a callee-saved register which may be used for other purposes in certain contexts, for example to hold special variables within the kernel. This change adds support for reserving this register both to frontend and backend to make this register usable for these purposes. Differential Revision: https://reviews.llvm.org/D46552 llvm-svn: 334531
* [X86] Remove mayLoad flag from AVX512 truncating store instructions.Craig Topper2018-06-121-2/+1
| | | | llvm-svn: 334529
* [MS][ARM64] Hoist __ImageBase handling into TargetLoweringObjectFileCOFFReid Kleckner2018-06-123-112/+0
| | | | | | | | | | | | All COFF targets should use @IMGREL32 relocations for symbol differences against __ImageBase. Do the same for getSectionForConstant, so that immediates lowered to globals get merged across TUs. Patch by Chris January Differential Revision: https://reviews.llvm.org/D47783 llvm-svn: 334523
* AMDHSA/NFC: Code object v3 updates (additional):Konstantin Zhuravlyov2018-06-122-13/+16
| | | | | | - Move section selection and alignment to AMDGPUAsmPrinter llvm-svn: 334521
* AMDHSA: Code object v3 updatesKonstantin Zhuravlyov2018-06-126-10/+184
| | | | | | | | | | | | | | | - Do not emit following assembler directives: - .hsa_code_object_version - .hsa_code_object_isa - .amd_amdgpu_isa - .amd_amdgpu_hsa_metadata - .amd_amdgpu_pal_metadata - Do not emit .note entries - Cleanup and bring in sync kernel descriptor header file - Emit kernel descriptor into .rodata with appropriate relocations and alignments llvm-svn: 334519
* [MC] [X86] Teach leaq _GLOBAL_OFFSET_TABLE(%rip), %r15 to use ↵Fangrui Song2018-06-121-1/+7
| | | | | | | | | | | | | | | | | | | R_X86_64_GOTPC32 instead of R_X86_64_PC32 Summary: This is similar to D46319 (ARM). x86-64 psABI p40 gives an example: leaq _GLOBAL_OFFSET_TABLE(%rip), %r15 # GOTPC32 reloc GNU as creates R_X86_64_GOTPC32. However, MC currently emits R_X86_64_PC32. Reviewers: javed.absar, echristo Subscribers: kristof.beyls, llvm-commits, peter.smith, grimar Differential Revision: https://reviews.llvm.org/D47507 llvm-svn: 334515
* [CostModel] Replace ShuffleKind::SK_Alternate with ShuffleKind::SK_Select ↵Simon Pilgrim2018-06-122-30/+30
| | | | | | | | | | | | | | | | | | (PR33744) As discussed on PR33744, this patch relaxes ShuffleKind::SK_Alternate which requires shuffle masks to only match an alternating pattern from its 2 sources: e.g. v4f32: <0,5,2,7> or <4,1,6,3> This seems far too restrictive as most SIMD hardware which will implement it using a general blend/bit-select instruction, so replaces it with SK_Select, permitting elements from either source as long as they are inline: e.g. v4f32: <0,5,2,7>, <4,1,6,3>, <0,1,6,7>, <4,1,2,3> etc. This initial patch just updates the name and cost model shuffle mask analysis, later patch reviews will update SLP to better utilise this - it still limits itself to SK_Alternate style patterns. Differential Revision: https://reviews.llvm.org/D47985 llvm-svn: 334513
* [X86] Remove TB_ALIGN_16 from VEXTRACTF128/VEXTRACTI128 in the memory ↵Craig Topper2018-06-121-2/+2
| | | | | | folding table. llvm-svn: 334511
* [Hexagon] Make floating point operations expensive for vectorizationKrzysztof Parzyszek2018-06-122-6/+35
| | | | llvm-svn: 334508
* [x86] move shrunkblend transform to helper function; NFCISanjay Patel2018-06-121-74/+76
| | | | | | | We should be able to obsolete D48043 by easing the constraints on this existing code. llvm-svn: 334504
* [SelectionDAG] Provide default expansion for rotatesKrzysztof Parzyszek2018-06-123-0/+32
| | | | | | | | | | | | | Implement default legalization of rotates: either in terms of the rotation in the opposite direction (if legal), or in terms of shifts and ors. Implement generating of rotate instructions for Hexagon. Hexagon only supports rotates by an immediate value, so implement custom lowering of ROTL/ROTR on Hexagon. If a rotate is not legal, use the default expansion. Differential Revision: https://reviews.llvm.org/D47725 llvm-svn: 334497
* [mips] Guard some floating point instructions correctlySimon Dardis2018-06-121-31/+37
| | | | | | | | Reviewers: smaksimovic, atanasyan, abeserminji Differential Revision: https://reviews.llvm.org/D47636 llvm-svn: 334491
* [mips] Extend LONG_BRANCH_LUi/ADDiu with extra parameterAleksandar Beserminji2018-06-123-22/+67
| | | | | | | | | | | Extend LONG_BRANCH_LUi and LONG_BRANCH_ADDiu pseudo instructions with additional flag, so instead of always lowering to lui %hi(...), addiu %lo(...) or addiu %hi(...), now they can lower to either %lo, %hi, %higher or %highest depending on the added flag. Differential Revision: https://reviews.llvm.org/D47941 llvm-svn: 334490
* [AArch64] Audit on rL333879 to fix FP16 64bit bitpatternsLuke Geeson2018-06-121-2/+2
| | | | llvm-svn: 334488
* [X86] Add NotMemoryFoldable to the VPCOMPRESS instructions.Craig Topper2018-06-121-4/+4
| | | | llvm-svn: 334481
* [X86] Add NotMemoryFoldable to more instructions.Craig Topper2018-06-121-14/+22
| | | | | | These include PUSH/POP instructions that don't match the manual table. This also includes CMPXCHG which we never emit in non-locked form. llvm-svn: 334479
* [X86] Add NotMemoryFoldable to a bunch of instructions to suppress them from ↵Craig Topper2018-06-124-46/+54
| | | | | | | | | | the autogenerated load folding table. Most of these are system instructions or other instructions we don't use in CodeGen. No point wasting space for them in the table. Removing them from the autogenerated table makes it easier to review the manual table. A few are real opcode collisions where the memory and register forms are completely different instructions. llvm-svn: 334474
* [X86] Add isel patterns for folding loads when creating ROUND instructions ↵Craig Topper2018-06-122-16/+200
| | | | | | | | | | | | | | | | from ffloor/fnearbyint/fceil/frint/ftrunc. We were missing packed isel folding patterns for all of sse41, avx, and avx512. For some reason avx512 had scalar load folding patterns under optsize(due to partial/undef reg update), but we didn't have the equivalent sse41 and avx patterns. Sometimes we would get load folding due to peephole pass anyway, but we're also missing avx512 instructions from the load folding table. I'll try to fix that in another patch. Some of this was spotted in the review for D47993. This patch adds all the folds to isel, adds a few spot tests, and disables the peephole pass on a few tests to ensure we're testing some of these patterns. llvm-svn: 334460
* [AMDGPU] prevent hitting Assertion `isReg() && "Wrong MachineOperand accessor"'Mark Searles2018-06-121-2/+2
| | | | | | | | | The use iterator, used within findMaskOperands(), can return anything which is not a def. isUse() requires a register, so check isReg() before calling isUse(). Differential Revision: https://reviews.llvm.org/D48047 llvm-svn: 334459
* Simplify; NFCGeorge Burgess IV2018-06-111-1/+1
| | | | | | Not shown in the diff: AQ is a `vector<SUnit *>`, and SU is a `SUnit *` llvm-svn: 334451
* AMDGPU: Add 64-bit relative variant kindKonstantin Zhuravlyov2018-06-111-0/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D47601 llvm-svn: 334443
* [X86] Push some variable declarations down into the individual switch cases ↵Craig Topper2018-06-111-3/+4
| | | | | | | | that need them. NFC All of the cases are already wrapped in curly braces so declaring a variable there isn't an issue. And the variables aren't assigned or used in the larger scope. llvm-svn: 334436
* [X86] Reorder some type constraints to force things to be vectors and ↵Craig Topper2018-06-111-4/+4
| | | | | | | | integer/fp before forcing them to be the same size. This may be needed by another patch that I'm working on. It should have no effect on any of the generated outputs. llvm-svn: 334430
* [Hexagon] Late predicate producers cannot be used as dot-new sourcesKrzysztof Parzyszek2018-06-111-4/+23
| | | | llvm-svn: 334426
* [X86][AVX512] Tag AVX5124FMAPS/AVX5124VNNIW with missing scheduler classesSimon Pilgrim2018-06-111-6/+12
| | | | | | | | Necessary for D46276 as even though btver2 doesn't use these instructions, its now flagged as complete so complains if ANY instruction isn't tagged..... UnsupportedFeatures wouldn't help here as these instructions don't appear to have a feature predicate (like a lot of AVX512). llvm-svn: 334423
* [AMDGPU] Do not consider indirect acces through phi for wave limiterStanislav Mekhanoshin2018-06-111-6/+0
| | | | | | | | | | | Rational: if there is indirect access that is usually an issue because load is not ready by the use. However, if use is inside a loop and load is outside that is potentially an issue for a first iteration only. Differential Revision: https://reviews.llvm.org/D47740 llvm-svn: 334420
* [mips] Fix spill slot for mips3, n64 abiAleksandar Beserminji2018-06-111-3/+4
| | | | | | | | | | | | | | When program is compiled for mips3 with n64 abi, wrong register class is used for creating an emergency spill slot. This patch fixes the correct register class to be chosen. This patch resolves PR35859. Thanks to John Baldwin for reporting the issue! Differential Revision: https://reviews.llvm.org/D47938 llvm-svn: 334419
* [AVR] Set trackLivenessAfterRegAllocDylan McKay2018-06-111-0/+5
| | | | | | | | | | | | | | | | | This sets trackLivenessAfterRegAlloc on AVRRegisterInfo. Most existing targets set this flag. Without it, specific IR inputs cause LLVM to fail with: Assertion failed: (getParent()->getProperties().hasProperty( MachineFunctionProperties::Property::TracksLiveness) && "Liveness information is accurate"), function livein_begin file MachineBasicBlock.cpp, line 1354. With this commit, this no longer happens. Patch by Peter Nimmervoll. llvm-svn: 334409
* [X86] Fix skylake server scheduling info.Clement Courbet2018-06-1111-310/+743
| | | | | | | | | | | | | | Summary: This fixes most of the scheduling info for SKX vector operations. I had to split a lot of the YMM/ZMM classes into separate classes for YMM and ZMM. The before/after llvm-exegesis analysis are in the phabricator diff. Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47721 llvm-svn: 334407
* [ExynosM1][Sched] Fix resource usage in scheduling model.Clement Courbet2018-06-111-16/+16
| | | | | | This is part of https://reviews.llvm.org/D46356. llvm-svn: 334391
* [X86] Explicitly mark unsupported classes in scheduling models.Clement Courbet2018-06-119-111/+131
| | | | | | | | | | | | | Summary: In preparation for D47721. HSW and SNB still define unsupported classes as they are used by KNL and generic models respectively. Reviewers: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47763 llvm-svn: 334389
* [X86] Remove masking from dbpsadbw intrinsics, use select in IR instead.Craig Topper2018-06-112-9/+13
| | | | llvm-svn: 334384
* [Sparc] Add support for 13-bit PICDaniel Cederman2018-06-117-7/+49
| | | | | | | | | | | | | | | | | Summary: When compiling with -fpic, in contrast to -fPIC, use only the immediate field to index into the GOT. This saves space if the GOT is known to be small. The linker will warn if the GOT is too large for this method. Reviewers: jyknight, venkatra Reviewed By: jyknight Subscribers: brad, fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D47136 llvm-svn: 334383
* [X86] Remove and autoupgrade the expandload and compressstore intrinsics.Craig Topper2018-06-112-131/+1
| | | | | | We use the target independent intrinsics now. llvm-svn: 334381
* [X86] Miscellaneous fixes to get the load folding table generator to work again.Craig Topper2018-06-103-9/+9
| | | | llvm-svn: 334377
* [NEON] Support VST1xN intrinsics in AArch32 mode (LLVM part)Ivan A. Kosarev2018-06-104-8/+165
| | | | | | | | | We currently support them only in AArch64. The NEON Reference, however, says they are 'ARMv7, ARMv8' intrinsics. Differential Revision: https://reviews.llvm.org/D47447 llvm-svn: 334361
* [X86] Remove masking from the 512-bit masked floating point add/sub/mul/div ↵Craig Topper2018-06-102-19/+24
| | | | | | intrinsics. Use a select in IR instead. llvm-svn: 334358
* [X86] NFC Use member initialization in X86SubtargetGabor Buella2018-06-092-215/+107
| | | | | | | | The separate initializeEnvironment function was sort of useless since r217071. ARM did this move already with r273556. llvm-svn: 334345
* [ARM] Allow CMPZ transforms even if the input has multiple uses.Eli Friedman2018-06-081-1/+1
| | | | | | | | | | It looks like this got left in by accident in r289794; I can't think of any reason this check would be necessary. (Maybe it was meant to be a check that the AND has one use? But we check that a few lines earlier.) Differential Revision: https://reviews.llvm.org/D47921 llvm-svn: 334322
* [X86][SSE] Support v8i16/v16i16 rotationsSimon Pilgrim2018-06-081-14/+30
| | | | | | | | Extension to D46954 (PR37426), this patch adds support for v8i16/v16i16 rotations in a similar manner - the conversion of the shift/rotate amount to a multiplication factor and the use of PMULLW to shift left and PMULHUW (ISD::MULHU) to shift the wrapped bits back around to be ORd together. Differential Revision: https://reviews.llvm.org/D47822 llvm-svn: 334309
* [X86][BtVer2] Add support for all SUB/XOR 32/64 scalar instructions that ↵Simon Pilgrim2018-06-081-1/+8
| | | | | | | | should match the dependency-breaking 'zero-idiom' As detailed on Agner's Microarchitecture doc (21.8 AMD Bobcat and Jaguar pipeline - Dependency-breaking instructions), these instructions are dependency breaking and fast-path zero the destination register (and appropriate EFLAGS bits). llvm-svn: 334303
* [AMDGPU] Inline asm - added i16, half and i128 types supportDaniil Fukalov2018-06-081-16/+32
| | | | | | | | | | AMDGPU inline assembler support i16, half and i128 typed variables in constraints, but they were reported as error. Needed to fix https://github.com/RadeonOpenCompute/ROCm/issues/341, e.g. to be able to load with global_load_dwordx4 to a 128bit integer variable Differential Revision: https://reviews.llvm.org/D44920 llvm-svn: 334301
* [X86][SSE] Simplify combineVectorTruncationWithPACKUS to reduce code duplicationSimon Pilgrim2018-06-081-37/+5
| | | | | | | | | | Simplify combineVectorTruncationWithPACKUS to mask the upper bits followed by calling truncateVectorWithPACK instead of duplicating with similar code. This results in the codegen using (V)PACKUSDW on SSE41+ targets for vXi64/vXi32 inputs where before it always used PACKUSWB (along with a lot more bitcasting). I've raised PR37749 as until we avoid unnecessary concats back to 256-bit for bitwise ops, we can't avoid splitting the input value into 128-bit subvectors for masking. llvm-svn: 334289
* [mips] Correct the predicates for a number of codegen only instructionsSimon Dardis2018-06-081-37/+52
| | | | | | | | Reviewers: smaksimovic, atanasyan, abeserminji Differential Revision: https://reviews.llvm.org/D47638 llvm-svn: 334280
* [RISCV] Implement MC layer support for the fence.tso instructionAlex Bradbury2018-06-081-0/+6
| | | | | | | | | | | The instruction makes use of a previously ignored field in the fence instruction. It is introduced in the version 2.3 draft of the RISC-V specification after much work by the Memory Model Task Group. As clarified here <https://github.com/riscv/riscv-isa-manual/issues/186>, the fence.tso assembler mnemonic does not have operands. llvm-svn: 334278
* [X86][SSE] Consistently prefer lowering to PACKUS over PACKSSSimon Pilgrim2018-06-081-32/+32
| | | | | | | | | | We have some combines/lowerings that attempt to use PACKSS-then-PACKUS and others that use PACKUS-then-PACKSS. PACKUS is much easier to combine with if we know the upper bits are zero as ComputeKnownBits can easily see through BITCASTs etc. especially now that rL333995 and rL334007 have landed. It also effectively works at byte level which further simplifies shuffle combines. The only (minor) annoyances are that ComputeKnownBits can sometimes take longer as it doesn't fail as quickly as ComputeNumSignBits (but I'm not seeing any actual regressions in tests) and PACKUSDW only became available after SSE41 so we have more codegen diffs between targets. llvm-svn: 334276
* AMDGPU: Error on LDS global address in functionsMatt Arsenault2018-06-081-1/+9
| | | | | | | These won't work as expected now, so error on them to avoid wasting time debugging this in the future. llvm-svn: 334269
* [NFC] fix formattingHiroshi Inoue2018-06-081-1/+1
| | | | llvm-svn: 334263
OpenPOWER on IntegriCloud