summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [AArch64][SVE] Asm: Support for saturating INC/DEC (32bit scalar) instructions.Sander de Smalen2018-06-186-14/+148
| | | | | | | | | | | | | | | | | | | | | | | The variants added by this patch are: - SQINC signed increment, e.g. sqinc x0, w0, all, mul #4 - SQDEC signed decrement, e.g. sqdec x0, w0, all, mul #4 - UQINC unsigned increment, e.g. uqinc w0, all, mul #4 - UQDEC unsigned decrement, e.g. uqdec w0, all, mul #4 This patch includes asmparser changes to parse a GPR64 as a GPR32 in order to satisfy the constraint check: x0 == GPR64(w0) in: sqinc x0, w0, all, mul #4 ^___^ (must match) Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D47716 llvm-svn: 334980
* [WebAssembly] Cleaned up register accessors in WebAssemblyMachineFunctionInfo.hWouter van Oortmerssen2018-06-181-10/+14
| | | | | | | | Tested: llvm-lit -v `find test -name WebAssembly` (This is a commit access "test commit" :) llvm-svn: 334979
* [X86] Encode the EVEX2VEX exception list information in .td files instead of ↵Craig Topper2018-06-182-15/+38
| | | | | | | | the emitter source. Rather than having an exclusion list in tablegen sources, add a flag to the X86 instruction records that can be used to suppress checking for convertibility. llvm-svn: 334971
* [AArch64][SVE] Asm: Support for saturating INC/DEC (64bit scalar) instructions.Sander de Smalen2018-06-182-0/+51
| | | | | | | | | | | | | | | | | | Summary: The variants added by this patch are: - SQINC (signed increment) - UQINC (unsigned increment) - SQDEC (signed decrement) - UQDEC (unsigned decrement) For example: uqincw x0, all, mul #4 Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar Differential Revision: https://reviews.llvm.org/D47715 llvm-svn: 334948
* [X86][BtVer2] Flag AVX2+ scheduler classes as unsupportedSimon Pilgrim2018-06-181-20/+20
| | | | | | | | Jaguar only supports up to AVX1 Differential Revision: https://reviews.llvm.org/D48274 llvm-svn: 334947
* [AArch64][SVE] Asm: Support for vector element compares.Sander de Smalen2018-06-182-0/+102
| | | | | | | | | | | | | | | | This patch adds instructions for comparing elements from two vectors, e.g. cmpgt p0.s, p0/z, z0.s, z1.s and also adds support for comparing to a 64-bit wide element vector, e.g. cmpgt p0.s, p0/z, z0.s, z1.d The patch also contains aliases for certain comparisons, e.g.: cmple p0.s, p0/z, z0.s, z1.s => cmpge p0.s, p0/z, z1.s, z0.s cmplo p0.s, p0/z, z0.s, z1.s => cmphi p0.s, p0/z, z1.s, z0.s cmpls p0.s, p0/z, z0.s, z1.s => cmphs p0.s, p0/z, z1.s, z0.s cmplt p0.s, p0/z, z0.s, z1.s => cmpgt p0.s, p0/z, z1.s, z0.s llvm-svn: 334931
* [X86] Fix NOOP sched overrides on BDW/HSW/SKL.Clement Courbet2018-06-183-6/+3
| | | | | | | | | | | | Summary: Noop certainly does not use resources. Reviewers: RKSimon, craig.topper, andreadb Subscribers: gbedwell, llvm-commits, gchatelet Differential Revision: https://reviews.llvm.org/D48028 llvm-svn: 334927
* [X86] Create X86InstrFMA3Group objects fully in a static table instead of on ↵Craig Topper2018-06-182-227/+146
| | | | | | | | | | | | | | the heap. NFCI Previously we heap allocated the X86InstrFMA3Group objects which were created by passing them small register/memory opcode arrays that existed as individual static tables. Rather than a bunch of small static arrays we now have one large static table of X86InstrFMA3Group objects. Rather than storing a pointer to the opcode arrays in the X86InstrFMA3Group object, we now store have a register and memory array as part of the object. If a group doesn't have memory or register opcodes, the array entries will be 0. This greatly simplifies the destruction of the X86InstrFMA3Info object. We no longer need to delete the X86InstrFMA3Group objects as we destruct the DenseMap. And we don't need to keep track of which ones we already deleted. This reduces the llc binary size on my local machine by ~50k. I can only assume that's really due to the fact that we had something like 512 small static arrays that we passed to the init functions either one at a time or in pairs. So there were between 256 and 512 distinct calls to the init functions in the initOnceImpl method. llvm-svn: 334925
* [X86] Add '.s' aliases to the assembler for the various redundant move ↵Craig Topper2018-06-184-6/+77
| | | | | | | | | | encodings to match gas and our EVEX instructions. We already have these aliases for EVEX enocded instructions, but not for the GPR, MMX, SSE, and VEX versions. Also remove the vpextrw.s EVEX alias. That's not something gas implements. llvm-svn: 334922
* [X86] Move the 'vmovq.s' and similar assembly strings for EVEX vector moves ↵Craig Topper2018-06-181-45/+80
| | | | | | | | | | with reversed operands to InstAliases. The .s assembly strings allow the reversed forms to be targeted from assembly which matches gas behavior. But when printing the instructions we should print them without the .s to match other tooling like objdump. By using InstAliases we can use the normal string in the instruction and just hide it from the assembly parser. Ideally we'd add the .s versions to the legacy SSE and VEX versions as well for full compatibility with gas. Not sure how we got to state where only EVEX was supported. llvm-svn: 334920
* [X86] Add all the FMA instructions direclty to the load folding table ↵Craig Topper2018-06-172-100/+544
| | | | | | | | instead of proxying through X86InstrFMA3Info. These increases the size of the static tables, but is closer to what we would get if used the autogenerated table directly. This reduces the remaining large deltas between what's in the manual table and what's in the autogenerated table. llvm-svn: 334915
* [X86] Pass the parent SDNode to X86DAGToDAGISel::selectScalarSSELoad to ↵Craig Topper2018-06-172-14/+12
| | | | | | | | | | simplify the hasSingleUseFromRoot handling. Some of the calls to hasSingleUseFromRoot were passing the load itself. If the load's chain result has a user this would count against that. By getting the true parent of the match and ensuring any intermediate between the match and the load have a single use we can avoid this case. isLegalToFold will take care of checking users of the load's data output. This fixed at least fma-scalar-memfold.ll to succed without the peephole pass. llvm-svn: 334908
* [AArch64][SVE] Asm: Support for bitwise operations on predicate vectors.Sander de Smalen2018-06-171-0/+29
| | | | | | | | | | | | | | | | | | | This patch adds support for instructions performing bitwise operations on predicate vectors, including AND, BIC, EOR, NAND, NOR, ORN, ORR, and their status flag setting variants ANDS, BICS, EORS, NANDS, ORNS, ORRS. This patch also adds several aliases: orr p0.b, p1/z, p1.b, p1.b => mov p0.b, p1.b orrs p0.b, p1/z, p1.b, p1.b => movs p0.b, p1.b and p0.b, p1/z, p2.b, p2.b => mov p0.b, p1/z, p2.b ands p0.b, p1/z, p2.b, p2.b => movs p0.b, p1/z, p2.b eor p0.b, p1/z, p2.b, p1.b => not p0.b, p1/z, p2.b eors p0.b, p1/z, p2.b, p1.b => nots p0.b, p1/z, p2.b llvm-svn: 334906
* [AArch64][SVE] Asm: Support for SEL (vector/predicate) instructions.Sander de Smalen2018-06-172-0/+81
| | | | | | | | Support for SVE's predicated select instructions to select elements from either vector, both in a data-vector and a predicate-vector variant. llvm-svn: 334905
* [NVPTX] Ignore target-cpu and -features for inliningJonas Hahnfeld2018-06-171-0/+8
| | | | | | | | | | We don't want to prevent inlining because of target-cpu and -features attributes that were added to newer versions of LLVM/Clang: There are no incompatible functions in PTX, ptxas will throw errors in such cases. Differential Revision: https://reviews.llvm.org/D47691 llvm-svn: 334904
* [WebAssembly] Simple comment fix. NFC.Heejin Ahn2018-06-171-1/+1
| | | | llvm-svn: 334899
* [X86] More additions to the load folding tables based on the autogenerated ↵Craig Topper2018-06-165-33/+814
| | | | | | | | tables. Including more additions for NotMemoryFoldable to remove some entries from the autogenerated table. llvm-svn: 334898
* [X86] Hide POP16/32/64rmr and PUSH16/32/64rmr instructions from the assembly ↵Craig Topper2018-06-161-0/+12
| | | | | | | | parser. These all have a short form encoding that the assembler already prefers. Though that preference seems to only be based on order in the .td fie. Hiding the long form saves space in the table and prevents us from breaking the implicit order based priority. llvm-svn: 334897
* [X86] Fix an inconsistency between AVX512 and AVX/SSE version on a couple ↵Craig Topper2018-06-161-2/+2
| | | | | | | | | | instructions. VMOVPQIto64Zmr is not a 64-bit mode only instruction. But I don't know how to test this because VMOVPQIto64mr should always have priority over it in 32-bit mode since its only advantage is XMM16-XMM31 which aren't usable in 32-bit mode. VMOVPQIto64Zrr is a 64-bit mode only instruction, but we don't need to explicitly mark it as such because it uses a GR64 register which won't parse in 32-bit mode. llvm-svn: 334896
* [AMDGPU] setcc (select cc, CT, CF), CF, eq | ne -> xor cc, -1 | ccStanislav Mekhanoshin2018-06-161-17/+43
| | | | | | | | | This is the common case in the BE when we serialize condition and then rematerialize it. Use either original or inverted condition. Differential Revision: https://reviews.llvm.org/D48246 llvm-svn: 334882
* [globalisel][tablegen] Add support for C++ predicates on PatFrags and use it ↵Daniel Sanders2018-06-151-0/+10
| | | | | | | | | | | | | | | | | | | | | to support BFC on ARM. So far, we've only handled special cases of PatFrag like ImmLeaf. This patch adds support for the remaining cases using similar mechanisms. Like most C++ code from SelectionDAG, GISel and DAGISel expect to operate on different types and representations and as such the code is not compatible between the two. It's therefore necessary to add an alternative implementation in the GISelPredicateCode field. The target test for this feature could easily be done with IntImmLeaf and this would save on a little boilerplate. The reason I've chosen to implement this using PatFrag.GISelPredicateCode and not IntImmLeaf is because I was unable to find a rule that was blocked solely by lack of support for PatFrag predicates. I found that the ones I investigated as being likely candidates for the test were further blocked by other things. llvm-svn: 334871
* [X86] Add more instructions to the hasUndefRegUpdate list.Craig Topper2018-06-151-0/+31
| | | | | | Not sure any of these matter today because I don't think we ever produce them with IMPLICIT_DEF as an input. But by listing them we don't be suprised in the future. llvm-svn: 334867
* [PowerPC] Add support for high and higha symbol modifiers on tls modifers.Sean Fertile2018-06-151-0/+12
| | | | | | | | | | Enables using the high and high-adjusted symbol modifiers on thread local storage modifers in powerpc assembly. Needed to be able to support 64 bit thread-pointer and dynamic-thread-pointer access sequences. Differential Revision: https://reviews.llvm.org/D47754 llvm-svn: 334856
* [PPC64] Support "symbol@high" and "symbol@higha" symbol modifers.Sean Fertile2018-06-154-0/+34
| | | | | | | | | | Add support for the "@high" and "@higha" symbol modifiers in powerpc64 assembly. The modifiers represent accessing the segment consiting of bits 16-31 of a 64-bit address/offset. Differential Revision: https://reviews.llvm.org/D47729 llvm-svn: 334855
* [X86] Lowering sqrt intrinsics to native IRTomasz Krupa2018-06-153-58/+59
| | | | | | | | | | | | | | Summary: Complementary patch to lowering sqrt intrinsics in Clang. Reviewers: craig.topper, spatel, RKSimon, DavidKreitzer, uriel.k Reviewed By: craig.topper Subscribers: tkrupa, mike.dvoretsky, llvm-commits Differential Revision: https://reviews.llvm.org/D41599 llvm-svn: 334849
* [X86] Prevent folding stack reloads into instructions in hasUndefRegUpdate.Craig Topper2018-06-151-4/+10
| | | | | | An earlier commit prevented folds from the peephole pass by checking for IMPLICIT_DEF. But later in the pipeline IMPLICIT_DEF just becomes and Undef flag on the input register so we need to check for that case too. llvm-svn: 334848
* [AArch64][SVE] Asm: Support for CPY SIMD/FP and GPR instructions.Sander de Smalen2018-06-152-0/+79
| | | | | | | Predicated splat/copy of SIMD/FP register or general purpose register to SVE vector, along with MOV-aliases. llvm-svn: 334842
* [AArch64][SVE] Asm: Support for INC/DEC (scalar) instructions.Sander de Smalen2018-06-155-12/+105
| | | | | | | | | | | | | Increment/decrement scalar register by (scaled) element count given by predicate pattern, e.g. 'incw x0, all, mul #4'. Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D47713 llvm-svn: 334838
* AMDGPU: Add combine for short vector extract_vector_eltsMatt Arsenault2018-06-151-1/+42
| | | | | | | | | | Try to access pieces 4 bytes at a time. This helps various hasOneUse extract_vector_elt combines, such as load width reductions. Avoids test regressions in a future commit. llvm-svn: 334836
* AMDGPU: Make v4i16/v4f16 legalMatt Arsenault2018-06-158-92/+235
| | | | | | | Some image loads return these, and it's awkward working around them not being legal. llvm-svn: 334835
* [AArch64][SVE] Asm: Support for FADD, FMUL and FMAX immediate instructions.Sander de Smalen2018-06-153-0/+57
| | | | | | | | | | Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar Reviewed By: javed.absar Differential Revision: https://reviews.llvm.org/D47712 llvm-svn: 334831
* [mips] Add licensing information of the microMIPS tablegen files. (NFC)Simon Dardis2018-06-152-0/+26
| | | | llvm-svn: 334827
* [AArch64][SVE] Asm: Add parsing/printing support for exact FP immediates.Sander de Smalen2018-06-158-46/+164
| | | | | | | | | | | | | | | | Some instructions require of a limited set of FP immediates as operands, for example '#0.5 or #1.0' for SVE's FADD instruction. This patch adds support for parsing and printing such FP immediates as exact values (e.g. #0.499999 is not accepted for #0.5). Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D47711 llvm-svn: 334826
* [AMDGPU] Recognize x & ~(-1 << y) pattern.Roman Lebedev2018-06-151-0/+6
| | | | | | | | | | | | | | | | Summary: The same pattern as D48010, but this one is IR-canonical as of D47428. Reviewers: nhaehnle, bogner, tstellar, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #amdgpu Differential Revision: https://reviews.llvm.org/D48012 llvm-svn: 334817
* [AMDGPU] Recognize x & ((1 << y) - 1) pattern.Roman Lebedev2018-06-151-0/+7
| | | | | | | | | | | | | | | | | | | | | Summary: As a followup for D48007. Since we already handle `x << (bitwidth - y) >> (bitwidth - y)` pattern, which does not have ub for both the edge cases (`y == 0`, `y == bitwidth`), i think also handling a pattern that is ub for `y == bitwidth` should be fine. Reviewers: nhaehnle, bogner, tstellar, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #amdgpu Differential Revision: https://reviews.llvm.org/D48010 llvm-svn: 334816
* [AMDGPU] Recognize x & (-1 >> (32 - y)) pattern.Roman Lebedev2018-06-151-0/+7
| | | | | | | | | | | | | | | | | | | | | Summary: D47980 will canonicalize the `x << (32 - y) >> (32 - y)`, which is the pattern the AMDGPU expects to `x & (-1 >> (32 - y))`, which is not recognized by AMDGPU. Thus, it needs to be recognized, too. Reviewers: nhaehnle, bogner, tstellar, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #amdgpu Differential Revision: https://reviews.llvm.org/D48007 llvm-svn: 334815
* Revert r334802 "[X86] Prevent folding stack reloads with instructions that ↵Craig Topper2018-06-151-7/+4
| | | | | | | | have an undefined register update." There's a typo causing the build to fail. llvm-svn: 334803
* [X86] Prevent folding stack reloads with instructions that have an undefined ↵Craig Topper2018-06-151-4/+7
| | | | | | | | register update. We want to keep the load unfolded so we can use the same register for both sources to avoid a false dependency. llvm-svn: 334802
* [X86] Add more instructions to the memory folding tables using the ↵Craig Topper2018-06-151-1/+220
| | | | | | | | | | autogenerated table as a guide. I think this covers most of the unmasked vector instructions. We're still missing a lot of the masked instructions. There are some test changes here because of the new folding support. I don't think these particular cases should be folded because it creates an undef register dependency. I think the changes introduced in r334175 are not handling stack folding. They're only blocking the peephole pass. llvm-svn: 334800
* [X86] Add 'Z' to the internal names of various EVEX instructions for overall ↵Craig Topper2018-06-153-76/+76
| | | | | | consistency. llvm-svn: 334785
* [x86] be more selective about converting 'and' to shuffle (PR37749)Sanjay Patel2018-06-141-0/+6
| | | | | | | | | | | | | | | | | isVectorClearMaskLegal() is the TLI hook used by the generic DAGCombiner::XformToShuffleWithZero(). We've grown to accomodate/expect this transform to shuffle (disabling it more generally results in many regressions). So I'm narrowly excluding the 256-bit types that clearly are not worthwhile for AVX1. I think in most cases we are able to recover by converting the shuffle back into 'and' ops, but the cases in: https://bugs.llvm.org/show_bug.cgi?id=37749 ...show that there are cracks. llvm-svn: 334759
* [X86] Fix stale comment in folding tables.Craig Topper2018-06-141-3/+3
| | | | llvm-svn: 334758
* AMDGPU/GlobalISel: Implement select() for @llvm.amdgcn.cvt.pkrtzTom Stellard2018-06-143-0/+49
| | | | | | | | | | | | Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45907 llvm-svn: 334757
* [X86] Add more vector instructions to the memory folding table using the ↵Craig Topper2018-06-141-1/+215
| | | | | | | | autogenerated table as a guide. The test cahnge is because we now fold stack reload into RNDSCALE and RNDSCALE can be turned into ROUND by EVEX->VEX. llvm-svn: 334728
* [X86] Remove '128' from the internal name of some scalar FP instructions to ↵Craig Topper2018-06-141-8/+8
| | | | | | be consistent with other scalar instructions. llvm-svn: 334727
* [X86] Disable load unfolding for a bunch of instruction where unfolding ↵Craig Topper2018-06-141-16/+16
| | | | | | | | would increase the size of the load. Found by an audit of the manual table vs the autogenerated table. llvm-svn: 334726
* [X86] Remove NotMemoryFoldable from some AVX/AVX512 scalar instructions.Craig Topper2018-06-142-15/+14
| | | | | | Some of these instructions are already in the manual folding table so we should have them in the auto table too. llvm-svn: 334725
* [mips] Correct predicates for MSA pseudo instructionsSimon Dardis2018-06-141-1/+2
| | | | llvm-svn: 334708
* [x86] fix mappings of cvttp2si/cvttp2ui x86 intrinsics to x86-specific nodes ↵Craig Topper2018-06-144-43/+178
| | | | | | | | | | | | | | | | | | | | | and isel patterns (PR37551) Summary: The tests in: https://bugs.llvm.org/show_bug.cgi?id=37751 ...show miscompiles because we wrongly mapped and folded x86-specific intrinsics into generic DAG nodes. This patch corrects the mappings in X86IntrinsicsInfo.h and adds isel matching corresponding to the new patterns. The complete tests for the failure cases should be in avx-cvttp2si.ll and sse-cvttp2si.ll and avx512-cvttp2i.ll Reviewers: RKSimon, gbedwell, spatel Reviewed By: spatel Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D47993 llvm-svn: 334685
* AMDGPU/GlobalISel: Implement select() for 32-bit G_FADD and G_FMULTom Stellard2018-06-133-0/+16
| | | | | | | | | | | | Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46171 llvm-svn: 334665
OpenPOWER on IntegriCloud