summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [x86] fix formatting; NFCSanjay Patel2016-12-121-1/+1
| | | | llvm-svn: 289476
* [AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What ↵Eugene Zelenko2016-12-1210-135/+192
| | | | | | You Use warnings; other minor fixes (NFC). llvm-svn: 289475
* [PPC] Prefer direct move on power8 if load 1 or 2 bytes to VSRGuozhi Wei2016-12-122-1/+9
| | | | | | | | | Power8 has MTVSRWZ but no LXSIBZX/LXSIHZX, so move 1 or 2 bytes to VSR through MTVSRWZ is much faster than store the extended value into stack and load it with LXSIWZX. This patch fixes pr31144. Differential Revision: https://reviews.llvm.org/D27287 llvm-svn: 289473
* [mips] For PIC code convert unconditional jump to unconditional branchSimon Atanasyan2016-12-121-0/+11
| | | | | | | | | | | | Unconditional branch uses relative addressing which is the right choice in case of position independent code. This is a fix for the bug: https://dmz-portal.mips.com/bugz/show_bug.cgi?id=2445 Differential revision: https://reviews.llvm.org/D27483 llvm-svn: 289448
* AMDGPU: llvm.amdgcn.interp.mov is a source of divergenceNicolai Haehnle2016-12-121-0/+1
| | | | | | | | | | | | | | Summary: While the result is constant across a single primitive, each pixel shader wave can have pixels from multiple primitives. Reviewers: tstellarAMD, arsenm Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D27572 llvm-svn: 289447
* Update inline argument comment. NFCI.Simon Pilgrim2016-12-121-3/+3
| | | | | | combineX86ShufflesRecursively 'HasPSHUFB' flag has been the more generic 'HasVariableMask' flag for some time. llvm-svn: 289430
* [X86][SSE] Add support for combining SSE VSHLI/VSRLI uniform constant shifts.Simon Pilgrim2016-12-121-0/+33
| | | | | | Fixes some missed constant folding opportunities and allows us to combine shuffles that end with a logical bit shift. llvm-svn: 289429
* [X86][SSE] Lower suitably sign-extended mul vXi64 using PMULDQSimon Pilgrim2016-12-121-4/+21
| | | | | | | | PMULDQ returns the 64-bit result of the signed multiplication of the lower 32-bits of vXi64 vector inputs, we can lower with this if the sign bits stretch that far. Differential Revision: https://reviews.llvm.org/D27657 llvm-svn: 289426
* [X86] Teach selectScalarSSELoad to accept full 128-bit vector loads and the ↵Craig Topper2016-12-121-0/+22
| | | | | | | | X86ISD::VZEXT_LOAD opcode. Disable peephole on some of the tests that no longer require it to properly fold scalar intrinsics. llvm-svn: 289424
* [X86] Change CMPSS/CMPSD intrinsic instructions to use sse_load_f32/f64 as ↵Craig Topper2016-12-121-12/+12
| | | | | | | | | | its memory pattern instead of full vector load. These intrinsics only load a single element. We should use sse_loadf32/f64 to give more options of what loads it can match. Currently these instructions are often only getting their load folded thanks to the load folding in the peephole pass. I plan to add more types of loads to sse_load_f32/64 so we can match without the peephole. llvm-svn: 289423
* [X86] Remove some intrinsic instructions from hasPartialRegUpdateCraig Topper2016-12-121-8/+0
| | | | | | | | | | | | | | | | | Summary: These intrinsic instructions are all selected from intrinsics that have well defined behavior for where the upper bits come from. It's not the same place as the lower bits. As you can see we were suppressing load folding for these instructions in some cases. In none of the cases was the separate load helping avoid a partial dependency on the destination register. So we should just go ahead and allow the load to be folded. Only foldMemoryOperand was suppressing folding for these. They all have patterns for folding sse_load_f32/f64 that aren't gated with OptForSize, but sse_load_f32/f64 doesn't allow 128-bit vector loads. It only allows scalar_to_vector and vzmovl of scalar loads to match. There's no reason we can't allow a 128-bit vector load to be narrowed so I would like to fix sse_load_f32/f64 to allow that. And if I do that it changes some of these same test cases to fold the load too. Reviewers: spatel, zvi, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27611 llvm-svn: 289419
* [X86][SSE] Add support for combining target shuffles to SHUFPD.Simon Pilgrim2016-12-111-17/+45
| | | | llvm-svn: 289407
* [X86][AVX512] Add missing patterns for broadcast fallback in case load node ↵Ayman Musa2016-12-111-0/+6
| | | | | | | | | | | has multiple uses (for v4i64 and v4f64). When the load node which the broadcast instruction broadcasts has multiple uses, it cannot be folded. A fallback pattern is added to catch these cases and provide another solution. Differential Revision: https://reviews.llvm.org/D27661 llvm-svn: 289404
* [X86] Regcall - Adding support for mask typesOren Ben Simhon2016-12-113-38/+62
| | | | | | | | | Regcall calling convention passes mask types arguments in x86 GPR registers. The review includes the changes required in order to support v32i1, v16i1 and v8i1. Differential Revision: https://reviews.llvm.org/D27148 llvm-svn: 289383
* [X86] Fix a comment to say 'an FMA' instead of 'a FMA'. NFCCraig Topper2016-12-111-1/+1
| | | | llvm-svn: 289352
* [X86] Remove masking from 512-bit VPERMIL intrinsics in preparation for ↵Craig Topper2016-12-111-4/+2
| | | | | | being able to constant fold them in InstCombineCalls like we do for 128/256-bit. llvm-svn: 289350
* [AVR] Fix a signed vs unsigned compiler warningDylan McKay2016-12-111-1/+1
| | | | llvm-svn: 289349
* [AVR] Remove incorrect commentDylan McKay2016-12-101-2/+0
| | | | | | This should've been removed in r289323. llvm-svn: 289346
* [X86] Remove masking from 512-bit PSHUFB intrinsics in preparation for being ↵Craig Topper2016-12-101-2/+1
| | | | | | able to constant fold it in InstCombineCalls like we do for 128/256-bit. llvm-svn: 289344
* [X86][SSE] Ensure UNPCK inputs are a consistent value type in ↵Simon Pilgrim2016-12-101-2/+3
| | | | | | LowerHorizontalByteSum llvm-svn: 289341
* [AVX-512] Remove 128/256 masked vpermil instrinsics and autoupgrade to a ↵Craig Topper2016-12-101-8/+0
| | | | | | select around the unmasked avx1 intrinsics. llvm-svn: 289340
* AMDGPU: Fix asan errors when folding operandsMatt Arsenault2016-12-101-2/+2
| | | | | | | This was failing when trying to fold immediates into operand 1 of a phi, which only has one statically known operand. llvm-svn: 289337
* [X86][SSE] Move ZeroVector creation into the shuffle pattern case where its ↵Simon Pilgrim2016-12-101-2/+2
| | | | | | | | actually used. Also fix the ZeroVector's type - I've no idea how this hasn't caused problems........ llvm-svn: 289336
* [AVX-512] Add support for lowering (v2i64 (fp_to_sint (v2f32))) to ↵Craig Topper2016-12-102-6/+25
| | | | | | vcvttps2uqq when AVX512DQ and AVX512VL are available. llvm-svn: 289335
* [X86] Clarify indentation. NFCCraig Topper2016-12-101-1/+1
| | | | llvm-svn: 289334
* [X86] Combine LowerFP_TO_SINT and LowerFP_TO_UINT. They only differ by a ↵Craig Topper2016-12-102-24/+7
| | | | | | single boolean flag passed to a helper function. Just check the opcode and create the flag. llvm-svn: 289333
* [mips] Eliminate else-after-return. NFCSimon Atanasyan2016-12-101-4/+3
| | | | llvm-svn: 289331
* [AVR] Add a stub README fileDylan McKay2016-12-101-0/+8
| | | | llvm-svn: 289326
* [AVR] Fix and clean up the inline assembly testsDylan McKay2016-12-101-1/+3
| | | | | | | | | | There was a bug where we would hit an assertion if 'Q' was used as a constraint. I also removed hardcoded register names to prefer regexes so the tests don't break when the register allocator changes. llvm-svn: 289325
* [AVR] Fix an inline asm assertion which would always triggerDylan McKay2016-12-101-1/+1
| | | | | | | It looks like some time in the past, constraint codes were changed from chars being passed around to enums. llvm-svn: 289323
* [AVR] Use the register scavenger when expanding 'LDDW' instructionsDylan McKay2016-12-101-25/+45
| | | | | | | | | | | | Summary: This gets rid of the hardcoded 'r0' that was used previously. Reviewers: asl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27567 llvm-svn: 289322
* [AVR] Support stores to undefined pointersDylan McKay2016-12-101-1/+2
| | | | | | This would previously trigger an assertion error in AVRISelDAGToDAG. llvm-svn: 289321
* [X86] Use X86ISD::CVTTP2SI and X86ISD::CVTTP2UI for lowering 128-bit ↵Craig Topper2016-12-102-7/+7
| | | | | | | | | | cvttps2qq and cvttps2uqq intrinsics since there is a mismatch between number of input and output elements. Ideally ISD::FP_TO_SINT and ISD::FP_TO_UINT would only be used for cases with the same number of input and output elements. Similar things have already been done for other convert intrinsics. llvm-svn: 289316
* [AVR] Fix a bunch of incorrect assertion messagesDylan McKay2016-12-102-5/+5
| | | | | | | | | | These should've been checking whether the immediate is a 6-bit unsigned integer. If the immediate was '63', this would cause an assertion error which shouldn't have occurred. llvm-svn: 289315
* AMDGPU: Fix AMDGPUPromoteAlloca breaking addrspacecastsMatt Arsenault2016-12-101-1/+8
| | | | | | | The users of the addrspacecast were having their types incorrectly changed, producing invalid bitcasts between address spaces. llvm-svn: 289307
* AMDGPU: Fix handling of 16-bit immediatesMatt Arsenault2016-12-1019-222/+741
| | | | | | | | | | | | | | | | | | Since 32-bit instructions with 32-bit input immediate behavior are used to materialize 16-bit constants in 32-bit registers for 16-bit instructions, determining the legality based on the size is incorrect. Change operands to have the size specified in the type. Also adds a workaround for a disassembler bug that produces an immediate MCOperand for an operand that is supposed to be OPERAND_REGISTER. The assembler appears to accept out of bounds immediates and truncates them, but this seems to be an issue for 32-bit already. llvm-svn: 289306
* AMDGPU: Fix vintrp disassemblyMatt Arsenault2016-12-102-12/+12
| | | | llvm-svn: 289292
* AMDGPU: Change vintrp printing to better match scMatt Arsenault2016-12-102-12/+15
| | | | | | | Some of the immediates need to be printed differently eventually. llvm-svn: 289291
* [AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What ↵Eugene Zelenko2016-12-0915-150/+243
| | | | | | You Use warnings; other minor fixes (NFC). llvm-svn: 289282
* AMDGPU/SI: Remove XNACK feature from CIMarek Olsak2016-12-091-2/+1
| | | | | | | | | | | | Summary: CI doesn't have XNACK. Reviewers: tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27175 llvm-svn: 289263
* AMDGPU/SI: Don't reserve XNACK when it's disabledMarek Olsak2016-12-092-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: This frees 2 additional scalar registers. These are results from all of my 3 patches combined: Polaris: Spilled SGPRs: 2231 -> 1517 (-32.00 %) Tonga: Spilled SGPRs: 3829 -> 2608 (-31.89 %) Spilled VGPRs: 100 -> 84 (-16.00 %) Tonga even spills SGPRs via VGPRs to scratch. That's a compute shader limited to 64 VGPRs. Reviewers: tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27151 llvm-svn: 289262
* AMDGPU/SI: Don't reserve FLAT_SCR on non-HSA targets & without stack objectsMarek Olsak2016-12-093-8/+22
| | | | | | | | | | | | Summary: This frees 2 scalar registers. Reviewers: tstellarAMD Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27150 llvm-svn: 289261
* AMDGPU/SI: Allow using SGPRs 96-101 on VIMarek Olsak2016-12-094-20/+42
| | | | | | | | | | | | | | | Summary: There is no point in setting SGPRS=104, because VI allocates SGPRs in multiples of 16, so 104 -> 112. That enables us to use all 102 SGPRs for general purposes. Reviewers: tstellarAMD Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27149 llvm-svn: 289260
* AMDGPU: Fix isTypeDesirableForOp for i16Matt Arsenault2016-12-091-4/+16
| | | | | | This should do nothing for targets without i16. llvm-svn: 289235
* [SelectionDAG] Add knownbits support for EXTRACT_VECTOR_ELT opcodes (REAPPLIED)Simon Pilgrim2016-12-091-1/+6
| | | | | | Reapplied with fix for PR31323 - X86 SSE2 vXi16 multiplies for illegal types were creating CONCAT_VECTORS nodes with vector inputs that might not total the number of elements in the result type. llvm-svn: 289232
* AMDGPU: Fix i128 mulMatt Arsenault2016-12-091-0/+4
| | | | llvm-svn: 289231
* AMDGPU: Allow TBA, TMA, TTMP* registers with SMEM instructionsMatt Arsenault2016-12-091-2/+2
| | | | | | Fixes assembler regressions. llvm-svn: 289230
* AMDGPU: Clean up instruction bitsMatt Arsenault2016-12-092-98/+117
| | | | | | | | | Sort the instruction bits by type and make sure there is one for each format. Also cleanup namespaces. llvm-svn: 289229
* [PPC] Add intrinsics for vector extract word and vector insert word.Sean Fertile2016-12-091-0/+9
| | | | | Revision: https://reviews.llvm.org/D26547 llvm-svn: 289227
* Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵Nirav Dave2016-12-091-0/+10
| | | | | | | | UseAA is enabled." This reverts commit r289221 which appears to be triggering an assertion llvm-svn: 289226
OpenPOWER on IntegriCloud