summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [ARM] Make helpers static. NFC.Benjamin Kramer2019-10-021-3/+5
| | | | llvm-svn: 373503
* [X86] Rewrite to the vXi1 subvector insertion code to not rely on the value ↵Craig Topper2019-10-021-14/+26
| | | | | | | | | | | | | | of bits that might be undef The previous code tried to do a trick where we would extract the subvector from the location we were inserting. Then xor that with the new value. Take the xored value and clear out the bits above the subvector size. Then shift that xored subvector to the insert location. And finally xor that with the original vector. Since the old subvector was used in both xors, this would leave just the new subvector at the inserted location. Since the surrounding bits had been zeroed no other bits of the original vector would be modified. Unfortunately, if the old subvector came from undef we might aggressively propagate the undef. Then we end up with the XORs not cancelling because they aren't using the same value for the two uses of the old subvector. @bkramer gave me a case that demonstrated this, but we haven't reduced it enough to make it easily readable to see what's happening. This patch uses a safer, but more costly approach. It isolate the bits above the insertion and bits below the insert point and ORs those together leaving 0 for the insertion location. Then widens the subvector with 0s in the upper bits, shifts it into position with 0s in the lower bits. Then we do another OR. Differential Revision: https://reviews.llvm.org/D68311 llvm-svn: 373495
* [WebAssembly] Error when using wasm64 for ISelThomas Lively2019-10-021-0/+6
| | | | | | | | | | | | | | | | | | | | Summary: 64-bit WebAssembly (wasm64) is not specified and not supported in the WebAssembly backend. We do have support for it in clang, however, and we would like to keep that support because we expect wasm64 to be specified and supported in the future. For now add an error when trying to use wasm64 from the backend to minimize user confusion from unexplained crashes. Reviewers: aheejin, dschuff, sunfish Subscribers: sbc100, jgravelle-google, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68254 llvm-svn: 373493
* [AMDGPU] Extend buffer intrinsics with swizzlingPiotr Sobczak2019-10-0214-155/+308
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Extend cachepolicy operand in the new VMEM buffer intrinsics to supply information whether the buffer data is swizzled. Also, propagate this information to MIR. Intrinsics updated: int_amdgcn_raw_buffer_load int_amdgcn_raw_buffer_load_format int_amdgcn_raw_buffer_store int_amdgcn_raw_buffer_store_format int_amdgcn_raw_tbuffer_load int_amdgcn_raw_tbuffer_store int_amdgcn_struct_buffer_load int_amdgcn_struct_buffer_load_format int_amdgcn_struct_buffer_store int_amdgcn_struct_buffer_store_format int_amdgcn_struct_tbuffer_load int_amdgcn_struct_tbuffer_store Furthermore, disable merging of VMEM buffer instructions in SI Load/Store optimizer, if the "swizzled" bit on the instruction is on. The default value of the bit is 0, meaning that data in buffer is linear and buffer instructions can be merged. There is no difference in the generated code with this commit. However, in the future it will be expected that front-ends use buffer intrinsics with correct "swizzled" bit set. Reviewers: arsenm, nhaehnle, tpr Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, arphaman, jfb, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68200 llvm-svn: 373491
* [AArch64][SVE] Implement int_aarch64_sve_cnt intrinsicKerry McLaughlin2019-10-022-6/+16
| | | | | | | | | | | | | | | | Summary: This patch includes tests for the VecOfBitcastsToInt type added by D68021 Reviewers: c-rhodes, sdesmalen, rovka Reviewed By: c-rhodes Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits, cfe-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68023 llvm-svn: 373468
* [ARM] Identity shuffles are legalDavid Green2019-10-021-0/+1
| | | | | | | | | | | | | | | Identity shuffles, of the form (0, 1, 2, 3, ...) are perfectly OK under MVE (they essentially just become bitcasts). We were not catching that in the existing set of what we considered legal though. On NEON, they would be covered by vext's, but that is not generally available in MVE. This uses ShuffleVectorInst::isIdentityMask which is a little odd to use here but does what we want and prevents us from just rewriting what is the same function. Differential Revision: https://reviews.llvm.org/D68241 llvm-svn: 373446
* [AMDGPU] Make printf lowering faster when there are no printfsJay Foad2019-10-021-16/+14
| | | | | | | | | | | | | | | | | Summary: Printf lowering unconditionally visited every instruction in the module. To make it faster in the common case where there are no printfs, look up the printf function (if any) and iterate over its users instead. Reviewers: rampitec, kzhuravl, alex-t, arsenm Subscribers: jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68145 llvm-svn: 373433
* [X86] Add broadcast load folding patterns to the NoVLX compare patterns.Craig Topper2019-10-021-16/+138
| | | | | | | | | These patterns use zmm registers for 128/256-bit compares when the VLX instructions aren't available. Previously we only supported registers, but as PR36191 notes we can fold broadcast loads, but not regular loads. llvm-svn: 373423
* AMDGPU/GlobalISel: Use getIntrinsicID helperMatt Arsenault2019-10-023-7/+7
| | | | llvm-svn: 373417
* AMDGPU/GlobalISel: Assume VGPR for G_FRAME_INDEXMatt Arsenault2019-10-021-1/+7
| | | | | | | | | In principle this should behave as any other constant. However eliminateFrameIndex currently assumes a VALU use and uses a vector shift. Work around this by selecting to VGPR for now until eliminateFrameIndex is fixed. llvm-svn: 373415
* AMDGPU/GlobalISel: Private loads always use VGPRsMatt Arsenault2019-10-021-4/+6
| | | | llvm-svn: 373414
* AMDGPU/GlobalISel: Legalize 1024-bit G_BUILD_VECTORMatt Arsenault2019-10-021-4/+6
| | | | | | This will be needed to support AGPR operations. llvm-svn: 373413
* AMDGPU/GlobalISel: Fix RegBankSelect for 1024-bit valuesMatt Arsenault2019-10-021-29/+35
| | | | llvm-svn: 373412
* [AMDGPU] separate accounting for agprsStanislav Mekhanoshin2019-10-023-7/+52
| | | | | | | | | Account and report agprs separately on gfx908. Other targets do not change the reporting. Differential Revision: https://reviews.llvm.org/D68307 llvm-svn: 373411
* [X86] Add a DAG combine to shrink vXi64 gather/scatter indices that are ↵Craig Topper2019-10-011-0/+34
| | | | | | | | | | | | constant with sufficient sign bits to fit in vXi32 The gather/scatter instructions can implicitly sign extend the indices. If we're operating on 32-bit data, an v16i64 index can force a v16i32 gather to be split in two since the index needs 2 registers. If we can shrink the index to the i32 we can avoid the split. It should always be safe to shrink the index regardless of the number of elements. We have gather/scatter instructions that can use v2i32 index stored in a v4i32 register with v2i64 data size. I've limited this to before legalize types to avoid creating a v2i32 after type legalization. We could check for it, but we'd also need testing. I'm also only handling build_vectors with no bitcasts to be sure the truncate will constant fold. Differential Revision: https://reviews.llvm.org/D68247 llvm-svn: 373408
* AMDGPU: Fix an out of date assert in addressing FrameIndexChangpeng Fang2019-10-011-3/+2
| | | | | | | | | | Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D67574 llvm-svn: 373404
* Revert r373172 "[X86] Add custom isel logic to match VPTERNLOG from 2 logic ↵Craig Topper2019-10-011-79/+1
| | | | | | | | | | | | | | | | ops." This seems to be causing some performance regresions that I'm trying to investigate. One thing that stands out is that this transform can increase the live range of the operands of the earlier logic op. This can be bad for register allocation. If there are two logic op inputs we should really combine the one that is closest, but SelectionDAG doesn't have a good way to do that. Maybe we need to do this as a basic block transform in Machine IR. llvm-svn: 373401
* [X86] convertToThreeAddress, make sure second operand of SUB32ri is really ↵Craig Topper2019-10-011-0/+4
| | | | | | | | | | | | | an immediate before calling getImm(). It might be a symbol instead. We can't fold those since we can't negate them. Similar for other SUB with immediates. Fixes PR43529. llvm-svn: 373397
* AMDGPU/SILoadStoreOptimizer: Add helper functions for working with CombineInfoTom Stellard2019-10-011-205/+244
| | | | | | | | | | | | | | | | | | Summary: This is a refactoring that will make future improvements to this pass easier. This change should not change the behavior of the pass. Reviewers: arsenm, pendingchaos, rampitec, nhaehnle, vpykhtin Reviewed By: nhaehnle, vpykhtin Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65496 llvm-svn: 373366
* AMDGPU/GlobalISel: Increase max legal size to 1024Matt Arsenault2019-10-013-10/+13
| | | | | | | | There are 1024 bit register classes defined for AGPRs. Additionally OpenCL defines vectors up to 16 x i64, and this helps those tests legalize. llvm-svn: 373350
* [X86] Add a VBROADCAST_LOAD ISD opcode representing a scalar load ↵Craig Topper2019-10-016-182/+335
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | broadcasted to a vector. Summary: This adds the ISD opcode and a DAG combine to create it. There are probably some places where we can directly create it, but I'll leave that for future work. This updates all of the isel patterns to look for this new node. I had to add a few additional isel patterns for aligned extloads which we should probably fix with a DAG combine or something. This does mean that the broadcast load folding for avx512 can no longer match a broadcasted aligned extload. There's still some work to do here for combining a broadcast of a broadcast_load. We also need to improve extractelement or demanded vector elements of a broadcast_load. I'll try to get those done before I submit this patch. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68198 llvm-svn: 373349
* [AMDGPU] Add VerifyScheduling support.Jay Foad2019-10-011-0/+18
| | | | | | | | | | | | | | | | Summary: This is cut and pasted from the corresponding GenericScheduler functions. Reviewers: arsenm, atrick, tstellar, vpykhtin Subscribers: MatzeB, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68264 llvm-svn: 373346
* [X86] Consider isCodeGenOnly in the EVEX2VEX pass to make VMAXPD/PS map to ↵Craig Topper2019-10-011-16/+25
| | | | | | | | | | the non-commutable VEX instruction. Use EVEX2VEX override to fix the scalar instructions. Previously the match was ambiguous and VMAXPS/PD and VMAXCPS/PD were mapped to the same VEX instruction. But we should keep the commutableness when change the opcode. llvm-svn: 373303
* [WebAssembly] Make sure EH pads are preferred in sortingHeejin Ahn2019-10-011-0/+1
| | | | | | | | | | | | | | | | | | | | | Summary: In CFGSort, we try to make EH pads have higher priorities as soon as they are ready to be sorted, to prevent creation of unwind destination mismatches in CFGStackify. We did that by making priority queues' comparison function prefer EH pads, but it was possible for an EH pad to be popped from `Preferred` queue and then not sorted immediately and enter `Ready` queue instead in a certain condition. This patch makes sure that special condition does not consider EH pads as its candidates. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68229 llvm-svn: 373302
* [WebAssembly] Unstackify regs after fixing unwinding mismatchesHeejin Ahn2019-10-012-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Fixing unwind mismatches for exception handling can result in splicing existing BBs and moving some of instructions to new BBs. In this case some of stackified def registers in the original BB can be used in the split BB. For example, we have this BB and suppose %r0 is a stackified register. ``` bb.1: %r0 = call @foo ... use %r0 ... ``` After fixing unwind mismatches in CFGStackify, `bb.1` can be split and some instructions can be moved to a newly created BB: ``` bb.1: %r0 = call @foo bb.split (new): ... use %r0 ... ``` In this case we should make %r0 un-stackified, because its use is now in another BB. When spliting a BB, this CL unstackifies all def registers that have uses in the new split BB. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68218 llvm-svn: 373301
* AMDGPU/GlobalISel: Select s1 src G_SITOFP/G_UITOFPMatt Arsenault2019-10-014-8/+55
| | | | llvm-svn: 373298
* AMDGPU/GlobalISel: Add support for init.exec intrinsicsMatt Arsenault2019-10-016-20/+42
| | | | | | | TThe existing wave32 behavior seems broken and incomplete, but this reproduces it. llvm-svn: 373296
* AMDGPU/GlobalISel: Allow scc/vcc alternative mappings for s1 constantsMatt Arsenault2019-10-011-1/+15
| | | | llvm-svn: 373295
* AMDGPU/GlobalISel: Avoid creating shift of 0 in arg loweringMatt Arsenault2019-10-011-3/+8
| | | | | | | | This is sort of papering over the fact that we don't run a combiner anywhere, but avoiding creating 2 instructions in the first place is easy. llvm-svn: 373293
* TLI: Remove DAG argument from getRegisterByNameMatt Arsenault2019-10-0120-62/+60
| | | | | | | | | | | Replace with the MachineFunction. X86 is the only user, and only uses it for the function. This removes one obstacle from using this in GlobalISel. The other is the more tolerable EVT argument. The X86 use of the function seems questionable to me. It checks hasFP, before frame lowering. llvm-svn: 373292
* AMDGPU/GlobalISel: Select G_UADDO/G_USUBOMatt Arsenault2019-10-013-1/+47
| | | | llvm-svn: 373288
* GlobalISel: Implement widenScalar for G_SITOFP/G_UITOFP sourcesMatt Arsenault2019-10-011-3/+9
| | | | | | Legalize 16-bit G_SITOFP/G_UITOFP for AMDGPU. llvm-svn: 373287
* AMDGPU/GlobalISel: Legalize G_GLOBAL_VALUEMatt Arsenault2019-10-013-8/+105
| | | | | | | Handle other cases besides LDS. Mostly a straight port of the existing handling, without the intermediate custom nodes. llvm-svn: 373286
* [LegacyPassManager] Deprecate the BasicBlockPass/Manager.Alina Sbirlea2019-09-302-48/+51
| | | | | | | | | | | | | | | | | Summary: The BasicBlockManager is potentially broken and should not be used. Replace all uses of the BasicBlockPass with a FunctionBlockPass+loop on blocks. Reviewers: chandlerc Subscribers: jholewinski, sanjoy.google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68234 llvm-svn: 373254
* [X86] Mask off upper bits of splat element in LowerBUILD_VECTORvXi1 when ↵Craig Topper2019-09-301-2/+12
| | | | | | | | | | | | | | | | forming a SELECT. The i1 scalar would have been type legalized to i8, but that doesn't guarantee anything about the upper bits. If we're going to use it as condition we need to make sure the upper bits are 0. I've special cased ISD::SETCC conditions since that should guarantee zero upper bits. We could go further and use computeKnownBits, but we have no tests that would need that. Fixes PR43507. llvm-svn: 373246
* [X86] Address post-commit review from code I accidentally commited in r373136.Craig Topper2019-09-301-3/+6
| | | | | | See https://reviews.llvm.org/D68167 llvm-svn: 373245
* [NewPM] Port MachineModuleInfo to the new pass manager.Yuanfang Chen2019-09-302-4/+4
| | | | | | | | | | | | | Existing clients are converted to use MachineModuleInfoWrapperPass. The new interface is for defining a new pass manager API in CodeGen. Reviewers: fedor.sergeev, philip.pfaffe, chandlerc, arsenm Reviewed By: arsenm, fedor.sergeev Differential Revision: https://reviews.llvm.org/D64183 llvm-svn: 373240
* [X86] Add ANY_EXTEND to switch in ReplaceNodeResults, but just fall back to ↵Craig Topper2019-09-301-0/+6
| | | | | | | | | | | | | default handling. ANY_EXTEND of v8i8 is marked Custom on AVX512 for handling extends from v8i8. But the type legalization infrastructure will call ReplaceNodeResults for v8i8 results. We should just defer it the default handling instead of asserting in the default of the switch. Fixes PR43509. llvm-svn: 373234
* [AArch64][SVE] Implement punpk[hi|lo] intrinsicsKerry McLaughlin2019-09-302-2/+15
| | | | | | | | | | | | | | | | | | | | | | Summary: Adds the following two intrinsics: - int_aarch64_sve_punpkhi - int_aarch64_sve_punpklo This patch also contains a fix which allows LLVMHalfElementsVectorType to forward reference overloadable arguments. Reviewers: sdesmalen, rovka, rengolin Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, greened, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67830 llvm-svn: 373232
* [AArch64][GlobalISel] Support lowering variadic musttail callsJessica Paquette2019-09-301-11/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | This adds support for lowering variadic musttail calls. To do this, we have to... - Detect a musttail call in a variadic function before attempting to lower the call's formal arguments. This is done in the IRTranslator. - Compute forwarded registers in `lowerFormalArguments`, and add copies for those registers. - Restore the forwarded registers in `lowerTailCall`. Because there doesn't seem to be any nice way to wrap these up into the outgoing argument handler, the restore code in `lowerTailCall` is done separately. Also, irritatingly, you have to make sure that the registers don't overlap with any passed parameters. Otherwise, the scheduler doesn't know what to do with the extra copies and asserts. Add call-translator-variadic-musttail.ll to test this. This is pretty much the same as the X86 musttail-varargs.ll test. We didn't have as nice of a test to base this off of, but the idea is the same. Differential Revision: https://reviews.llvm.org/D68043 llvm-svn: 373226
* [mips] Fix code indentation. NFCSimon Atanasyan2019-09-301-3/+3
| | | | llvm-svn: 373225
* [AMDGPU] SIFoldOperands should not fold register acrocc the EXEC definitionAlexander Timofeev2019-09-301-0/+7
| | | | | | | | Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D67662 llvm-svn: 373221
* [Alignment][NFC] Remove AllocaInst::setAlignment(unsigned)Guillaume Chatelet2019-09-303-6/+7
| | | | | | | | | | | | | | | | | Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: jholewinski, arsenm, jvesely, nhaehnle, eraman, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D68141 llvm-svn: 373207
* [ARM][MVE] Change VCTP operandSam Parker2019-09-301-3/+3
| | | | | | | | | | | | The VCTP instruction will calculate the predicate masked based upon the number of elements that need to be processed. I had inserted the sub before the vctp intrinsic and supplied it as the operand, but this is incorrect as the phi should directly feed the vctp. The sub is calculating the value for the next iteration. Differential Revision: https://reviews.llvm.org/D67921 llvm-svn: 373188
* [ARM][CGP] Allow signext argumentsSam Parker2019-09-301-5/+2
| | | | | | | | | | | | As we perform a zext on any arguments used in the promoted tree, it doesn't matter if they're marked as signext. The only permitted user(s) in the tree which would interpret the sign bits are signed icmps. For these instructions, their promoted operands are truncated before the icmp uses them. Differential Revision: https://reviews.llvm.org/D68019 llvm-svn: 373186
* [SystemZ] Add SystemZPostRewrite in addPostRegAlloc() instead at -O0.Jonas Paulsson2019-09-301-1/+4
| | | | | | | | SystemZPostRewrite needs to be run before (it may emit COPYs) the Post-RA pseudo pass also at -O0, so it should be added in addPostRegAlloc(). Review: Ulrich Weigand llvm-svn: 373182
* [X86] Remove some redundant isel patterns. NFCICraig Topper2019-09-301-78/+0
| | | | | | | These are all also implemented in avx512_logical_lowering_types with support for masking. llvm-svn: 373181
* AMDGPU/GlobalISel: Fix select for v2s16 and/or/xorMatt Arsenault2019-09-301-15/+17
| | | | llvm-svn: 373180
* [X86] Split v16i32/v8i64 bitreverse on avx512f targets without avx512bw to ↵Craig Topper2019-09-301-1/+12
| | | | | | enable the use of vpshufb on the 256-bit halves. llvm-svn: 373177
* [X86] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after r373174Fangrui Song2019-09-301-0/+1
| | | | llvm-svn: 373175
OpenPOWER on IntegriCloud