summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Make %eiz usage in 64-bit mode, force a 0x67 address size prefix. Fix ↵Craig Topper2018-06-231-0/+2
| | | | | | some test CHECK lines. llvm-svn: 335414
* [X86] Teach disassembler to use %eip instead of %rip when 0x67 prefix is ↵Craig Topper2018-06-231-1/+3
| | | | | | used on a rip-relative address. llvm-svn: 335413
* [X86][AsmParser] Improve base/index register checks.Craig Topper2018-06-231-8/+29
| | | | | | | | | -Ensure EIP isn't used with an index reigster. -Ensure EIP isn't used as index register. -Ensure base register isn't a vector register. -Ensure eiz/riz usage matches the size of their base register. llvm-svn: 335412
* [AMDGPU] Update includes for intrinsic changes :(Reid Kleckner2018-06-232-4/+4
| | | | llvm-svn: 335409
* [IR] Split Intrinsics.inc into enums and implementationsReid Kleckner2018-06-231-1/+2
| | | | | | | | | | | | | | | | | | | Implements PR34259 Intrinsics.h is a very popular header. Most LLVM TUs care about things like dbg_value, but they don't care how they are implemented. After I split these out, IntrinsicImpl.inc is 1.7 MB, so this saves each LLVM TU from scanning 1.7 MB of source that gets pre-processed away. It also means we can modify intrinsic properties without triggering a full rebuild, but that's probably less of a win. I think the next best thing to do would be to split out the target intrinsics into their own header. Very, very few TUs care about target-specific intrinsics. It's very hard to split up the target independent intrinsics like llvm.expect, assume, and dbg.value, though. llvm-svn: 335407
* [X86][AsmParser] Rework that allows (%dx) to be used in place of %dx with ↵Craig Topper2018-06-231-41/+29
| | | | | | | | | | in/out instructions. Previously, to support (%dx) we left a wide open hole in our 16-bit memory address checking. This let this address value be used with any instruction without error in the parser. It would later fail in the encoder with an assertion failure on debug builds and who knows what on release builds. This patch passes the mnemonic down to the memory operand parsing function so we can allow the (%dx) form only on specific instructions. llvm-svn: 335403
* [X86][AsmParser] Keep track of whether an explicit scale was specified while ↵Craig Topper2018-06-221-8/+16
| | | | | | | | | | parsing an address in Intel syntax. Use it for improved error checking. This allows us to check these: -16-bit addressing doesn't support scale so we should error if we find one there. -Multiplying ESP/RSP by a scale even if the scale is 1 should be an error because ESP/RSP can't be an index. llvm-svn: 335398
* [X86][AsmParser] In Intel syntax make sure we support ESP/RSP being the ↵Craig Topper2018-06-221-0/+4
| | | | | | | | | | second register in memory expressions like [EAX+ESP]. By default, the second register gets assigned to the index register slot. But ESP can't be an index register so we need to swap it with the other register. There's still a slight bug that we allow [EAX+ESP*1]. The existence of the multiply even though its with 1 should force ESP to the index register and trigger an error, but it doesn't currently. llvm-svn: 335394
* [X86] Don't accept (%si,%bp) 16-bit address expressions.Craig Topper2018-06-221-4/+9
| | | | | | | | | | The second register is the index register and should only be %si or %di if used with a base register. And in that case the base register should be %bp or %bx. This makes us compatible with gas. We do still need to support both orders with Intel syntax which uses [bp+si] and [si+bp] llvm-svn: 335384
* [X86][AsmParser] Allow (%bp,%si) and (%bp,%di) to be encoded without using a ↵Craig Topper2018-06-221-1/+1
| | | | | | | | zero displacement. (%bp) can't be encoded without a displacement. The encoding is instead used for displacement alone. So a 1 byte displacement of 0 must be used. But if there is an index register we can encode without a displacement. llvm-svn: 335379
* [X86][AsmParser] Check for invalid 16-bit base register in Intel syntax.Craig Topper2018-06-221-19/+24
| | | | llvm-svn: 335373
* [X86] Don't allow ESP/RSP to be used as an index register in assembly.Craig Topper2018-06-221-1/+2
| | | | | | Fixes PR37892 llvm-svn: 335370
* Recommit of r335326, with the test fixed that I missed.Sjoerd Meijer2018-06-221-3/+6
| | | | llvm-svn: 335331
* [CostModel][AArch64] Add some initial costs for SK_Select and ↵Simon Pilgrim2018-06-221-17/+32
| | | | | | | | | | | | | | SK_PermuteSingleSrc AArch64 was only setting costs for SK_Transpose, which meant that many of the simpler shuffles (e.g. SK_Select and SK_PermuteSingleSrc for larger vector elements) was being severely overestimated by the default shuffle expansion. This patch adds costs to help improve SLP performance and avoid a regression in reductions introduced by D48174. I'm not very knowledgeable about AArch64 shuffle lowering so I've kept the extra costs to a minimum - someone who knows this code can add extra costs which should improve vectorization a lot more. Differential Revision: https://reviews.llvm.org/D48172 llvm-svn: 335329
* Reverting r335326 while I look at the test failureSjoerd Meijer2018-06-221-6/+3
| | | | llvm-svn: 335328
* [ARM] ARMv6m and v8m.baseline strict alignSjoerd Meijer2018-06-221-3/+6
| | | | | | | | | | | | This sets target feature FeatureStrictAlign for Armv6-m and Armv8-m.baseline, because it has no support for unaligned accesses. It looks like we always pass target feature "+strict-align" from Clang, so this is not a user facing problem, but querying the subtarget (in e.g. llc) for unaligned access support is incorrect. Differential Revision: https://reviews.llvm.org/D48437 llvm-svn: 335326
* AMDGPU: Add patterns for i32/i64 local atomic load/storeMatt Arsenault2018-06-224-1/+54
| | | | | | | | Not sure why the 32/64 split is needed in the atomic_load store hierarchies. The regular PatFrags do this, but we don't do it for the existing handling for global. llvm-svn: 335325
* [X86] Changing the check for valid inputs in combineScalarToVectorMikhail Dvoretckii2018-06-221-5/+6
| | | | | | | | | Changing the logic of scalar mask folding to check for valid input types rather than against invalid ones, making it more robust and fixing PR37879. Differential Revision: https://reviews.llvm.org/D48366 llvm-svn: 335323
* AMDGPU/GlobalISel: Default to using TableGen'd instruction selectorTom Stellard2018-06-221-7/+0
| | | | | | | | | | | | | | | | Summary: We can select all instructions that are marked as legal in a full piglit run, so now is a good time to make the TableGen'd instruction selector default for all opcodes. This is NFC for a full piglit run, which is why there are no tests. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48198 llvm-svn: 335319
* AMDGPU/GlobalISel: legalize and select 32-bit G_ASHRTom Stellard2018-06-224-0/+47
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D48196 llvm-svn: 335318
* AMDGPU/GlobalISel: legalize and select 32-bit G_SITOFPTom Stellard2018-06-224-0/+18
| | | | | | | | | | | | Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48195 llvm-svn: 335316
* AMDGPU/GlobalISel: Implement select() for COPYTom Stellard2018-06-221-1/+4
| | | | | | | | | | | | Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46151 llvm-svn: 335315
* AMDGPU/GlobalISel: Implement select() for G_IMPLICIT_DEFTom Stellard2018-06-212-0/+16
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46150 llvm-svn: 335307
* Revert r335297 "[X86] Implement more of x86-64 large and medium PIC code models"Reid Kleckner2018-06-216-115/+29
| | | | | | MCJIT can't handle R_X86_64_GOT64 yet. llvm-svn: 335300
* [X86] Commit some comments that weren't in the medium code model patchReid Kleckner2018-06-211-4/+4
| | | | llvm-svn: 335298
* [X86] Implement more of x86-64 large and medium PIC code modelsReid Kleckner2018-06-216-27/+113
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The large code model allows code and data segments to exceed 2GB, which means that some symbol references may require a displacement that cannot be encoded as a displacement from RIP. The large PIC model even relaxes the assumption that the GOT itself is within 2GB of all code. Therefore, we need a special code sequence to materialize it: .LtmpN: leaq .LtmpN(%rip), %rbx movabsq $_GLOBAL_OFFSET_TABLE_-.LtmpN, %rax # Scratch addq %rax, %rbx # GOT base reg From that, non-local references go through the GOT base register instead of being PC-relative loads. Local references typically use GOTOFF symbols, like this: movq extern_gv@GOT(%rbx), %rax movq local_gv@GOTOFF(%rbx), %rax All calls end up being indirect: movabsq $local_fn@GOTOFF, %rax addq %rbx, %rax callq *%rax The medium code model retains the assumption that the code segment is less than 2GB, so calls are once again direct, and the RIP-relative loads can be used to access the GOT. Materializing the GOT is easy: leaq _GLOBAL_OFFSET_TABLE_(%rip), %rbx # GOT base reg DSO local data accesses will use it: movq local_gv@GOTOFF(%rbx), %rax Non-local data accesses will use RIP-relative addressing, which means we may not always need to materialize the GOT base: movq extern_gv@GOTPCREL(%rip), %rax Direct calls are basically the same as they are in the small code model: They use direct, PC-relative addressing, and the PLT is used for calls to non-local functions. This patch adds reasonably comprehensive testing of LEA, but there are lots of interesting folding opportunities that are unimplemented. Reviewers: chandlerc, echristo Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D47211 llvm-svn: 335297
* AMDGPU: Remove ability to reserve VGPRs for debuggerKonstantin Zhuravlyov2018-06-216-50/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D48234 llvm-svn: 335288
* [AMDGPU] Update assembler for HSA Code Object v3Scott Linder2018-06-216-75/+698
| | | | | | | | | | | | | | Update AMDGPU assembler syntax behind the code-object-v3 feature: * Replace/rename most AMDGPU assembler directives/symbols and document them. * Provide more diagnostics (e.g. values out of range, missing values, repeated values). * Provide path for backwards compatibility, even with underlying descriptor changes. Differential Revision: https://reviews.llvm.org/D47736 llvm-svn: 335281
* [mips] Modify comment to test new email address (NFC).Simon Dardis2018-06-211-1/+1
| | | | llvm-svn: 335269
* [AMDGPU] Fix bug with tracking processed blocks in SIInsertWaitcntsScott Linder2018-06-211-0/+1
| | | | | | | | | | BlockWaitcntProcessedSet was not being cleared between calls, so it was producing incorrect counts in cases where MBB addresses happened to coincide across multiple calls. Differential Revision: https://reviews.llvm.org/D48391 llvm-svn: 335268
* AMDGPU/AMDHSA: Remove GridWorkGroupCountX/Y/ZKonstantin Zhuravlyov2018-06-215-51/+0
| | | | | | | | | | | | and everything that comes with it from implementation and v3 header files. Leave definition in v2 header files for backwards compatibility. Differential Revision: https://reviews.llvm.org/D48191 llvm-svn: 335267
* Revert "[AArch64] Coalesce Copy Zero during instruction selection"Sirish Pande2018-06-211-29/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit d8f57105010cc7e78026e511d5def873fc91e0e7. Original Commit: Author: Haicheng Wu <haicheng@codeaurora.org> Date: Sun Feb 18 13:51:33 2018 +0000 [AArch64] Coalesce Copy Zero during instruction selection Add special case for copy of zero to avoid a double copy. Differential Revision: https://reviews.llvm.org/D36104 Author's intention is to remove a BB that has one mov instruction. In order to do that, d8f571050 pessmizes MachineSinking by introducing a copy, such that mov instruction is NOT moved to the BB. Optimization downstream gets rid of the BB with only mov instruction. This works well if we have only one fall through branch as there is only one "extra" mov instruction. If we have multiple fall throughs, we will have a lot of redundant movs. In such a case, it's better to have this BB which has one mov instruction. This is causing degradation in jpeg, fft and other codebases. I believe if we want to remove a BB with only one branch instruction, we should not pessimize Machine Sinking at all, and find some other solution. llvm-svn: 335251
* [ARM] Enable useAA() for the in-order Cortex-R52David Green2018-06-212-1/+13
| | | | | | | | | | | | This option allows codegen (such as DAGCombine or MI scheduling) to use alias analysis information, which can help with the codegen on in-order cpu's, especially machine scheduling. Here I have done things the same way as AArch64, adding a subtarget feature to enable this for specific cores, and enabled it for the R52 where we have a schedule to make use of it. Differential Revision: https://reviews.llvm.org/D48074 llvm-svn: 335249
* [RISCV] Tail calls don't need to save return addressSameer AbuAsal2018-06-211-2/+6
| | | | | | | | | | | | | | | | | | | | | | Summary: When expanding the PseudoTail in expandFunctionCall() we were using X6 to save the return address. Since this is a tail call the return address is not needed, this patch replaces it with X0 to be ignored. This matches the behaviour listed in the ISA V2.2 document page 110. tail offset -----> jalr x0, x6, offset GCC exhibits the same behavior. Reviewers: apazos, asb, mgrang Reviewed By: asb Subscribers: rbar, johnrusso, simoncook, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, rogfer01 Differential Revision: https://reviews.llvm.org/D48343 llvm-svn: 335239
* [x86] Lower some trunc + shuffle patterns to vpmov[q|d][b|w]Mikhail Dvoretckii2018-06-211-7/+118
| | | | | | | | | | | | This should help in lowering the following four intrinsics: _mm256_cvtepi32_epi8 _mm256_cvtepi64_epi16 _mm256_cvtepi64_epi8 _mm512_cvtepi64_epi8 Differential Revision: https://reviews.llvm.org/D46957 llvm-svn: 335238
* AMDGPU: Remove redundant MIMG instruction variantsNicolai Haehnle2018-06-211-20/+67
| | | | | | | | | | | | | | | | | | | | Summary: For sample and gather ops, we can accurately determine the set of vaddr-size instruction variants that are required. This reduces the size of instruction tables by ~5%. The number of machine instruction opcodes is reduced from 10002 to 9476. Change-Id: Ie7fc65d3657b762c7816017fe70b2e9bec644a8a Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D48168 llvm-svn: 335232
* AMDGPU: Remove old-style image intrinsicsNicolai Haehnle2018-06-216-995/+1
| | | | | | | | | | | | | | | | | | | | Summary: This also removes the need for atomic pseudo instructions, since we select the correct encoding directly in SITargetLowering::lowerImage for dimension-aware image intrinsics. Mesa uses dimension-aware image intrinsics since commit a9a7993441. Change-Id: I7473d20009476a4ed6d919cae4e6dca9ff42e77a Reviewers: arsenm, rampitec, mareko, tpr, b-sumner Subscribers: kzhuravl, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48167 llvm-svn: 335231
* AMDGPU: Select MIMG instructions manually in SITargetLoweringNicolai Haehnle2018-06-218-230/+345
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Having TableGen patterns for image intrinsics is hitting limitations: for D16 we already have to manually pre-lower the packing of data values, and we will have to do the same for A16 eventually. Since there is already some custom C++ code anyway, it is arguably easier to just do everything in C++, now that we can use the beefed-up generic tables backend of TableGen to provide all the required metadata and map intrinsics to corresponding opcodes. With this approach, all image intrinsic lowering happens in SITargetLowering::lowerImage. That code is dense due to all the cases that it handles, but it should still be easier to follow than what we had before, by virtue of it all being done in a single location, and by virtue of not relying on the TableGen pattern magic that very few people really understand. This means that we will have MachineSDNodes with MIMG instructions during DAG combining, but that seems alright: previously we had intrinsic nodes instead, but those are similarly opaque to the generic CodeGen infrastructure, and the final pattern matching just did a 1:1 translation to machine instructions anyway. If anything, the fact that we now merge the address words into a vector before DAG combine should be an advantage. Change-Id: I417f26bd88f54ce9781c1668acc01f3f99774de6 Reviewers: arsenm, rampitec, rtaylor, tstellar Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48017 llvm-svn: 335228
* AMDGPU: Refactor MIMG instruction TableGen using generic tablesNicolai Haehnle2018-06-2110-442/+298
| | | | | | | | | | | | | | | | | | | | Summary: This allows us to access rich information about MIMG opcodes from C++ code. Simplifying the mapping between equivalent opcodes of different data size becomes quite natural. This also flattens the MIMG-related class and multiclass hierarchy a little, and collapses together some of the scaffolding for sample and gather4 opcodes. Change-Id: I1a2549fdc1e881ff100e5393d2d87e73729a0ccd Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48016 llvm-svn: 335227
* AMDGPU: Use generic tables instead of SearchableTableNicolai Haehnle2018-06-215-22/+37
| | | | | | | | | | | | | Summary: Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48014 Change-Id: Ibb43f90d955275571aff17d0c3ecfb5e5b299641 llvm-svn: 335226
* AMDGPU: Pass AMDGPUSampleVariant to MIMG_{Sampler,Gather}(_WQM)Nicolai Haehnle2018-06-211-69/+73
| | | | | | | | | | | | | | | | Summary: This will allows us to provide rich metadata about the instructions in tables that are accessible by custom C++ code. Change-Id: Id9305a26304ab6a6cceb6c65c8cd49141cc0101d Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48011 llvm-svn: 335224
* AMDGPU: Add implicit def of SCC to kill and indirect pseudosNicolai Haehnle2018-06-211-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Kill instructions sometimes do use SCC in unusual circumstances, when v_cmpx cannot be used due to the operands that are involved. Additionally, even if SCC was never defined by the expansion, kill pseudos could previously occur between an s_cmp and an s_cbranch_scc, which breaks the SCC liveness tracking when the pseudo is expanded to split the basic block. While it would be possible to explicitly mark the SCC as live-in for the successor basic block, it's simpler to just mark the pseudo as using SCC, so that such a sequence is never emitted by instruction selection in the first place. A similar issue affects indirect source/dest pseudos in principle, although I haven't been able to come up with a test case where it actually matters (this affects instruction selection, so a MIR test can't be used). Fixes: dEQP-GLES3.functional.shaders.discard.dynamic_loop_always Change-Id: Ica8d82ecff1a763b892a1112cf1b06c948863a4f Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47761 llvm-svn: 335223
* AMDGPU: Turn D16 for MIMG instructions into a regular operandNicolai Haehnle2018-06-2113-396/+297
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This allows us to reduce the number of different machine instruction opcodes, which reduces the table sizes and helps flatten the TableGen multiclass hierarchies. We can do this because for each hardware MIMG opcode, we have a full set of IMAGE_xxx_Vn_Vm machine instructions for all required sizes of vdata and vaddr registers. Instead of having separate D16 machine instructions, a packed D16 instructions loading e.g. 4 components can simply use the same V2 opcode variant that non-D16 instructions use. We still require a TSFlag for D16 buffer instructions, because the D16-ness of buffer instructions is part of the opcode. Renaming the flag should help avoid future confusion. The one non-obvious code change is that for gather4 instructions, the disassembler can no longer automatically decide whether to use a V2 or a V4 variant. The existing logic which choose the correct variant for other MIMG instruction is extended to cover gather4 as well. As a bonus, some of the assembler error messages are now more helpful (e.g., complaining about a wrong data size instead of a non-existing instruction). While we're at it, delete a whole bunch of dead legacy TableGen code. Change-Id: I89b02c2841c06f95e662541433e597f5d4553978 Reviewers: arsenm, rampitec, kzhuravl, artem.tamazov, dp, rtaylor Subscribers: wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47434 llvm-svn: 335222
* [X86][AVX] Reduce v4f64/v4i64 shuffle costs (PR37882)Simon Pilgrim2018-06-211-4/+4
| | | | | | These were being over cautious for costs for one/two op general shuffles - VSHUFPD doesn't have to replicate the same shuffle in both lanes like VSHUFPS does. llvm-svn: 335216
* [X86] Remove masking from 512-bit floating max/min intrinsics. Use select ↵Craig Topper2018-06-211-8/+4
| | | | | | instruction instead. llvm-svn: 335199
* [mips] Add microMIPS specific addressing patterns.Simon Dardis2018-06-202-91/+113
| | | | | | | | | | | | | These are identical but use microMIPS instructions instead of MIPS instructions. Also, flatten the 'let AdditionalPredicates = [InMicroMips]' by using the ISA_MICROMIPS adjective. Add tests for constant materialization. Reviewers: atanasyan, abeserminji, smaksimovic Differential Revision: https://reviews.llvm.org/D48275 llvm-svn: 335185
* [X86] Use setcc ISD opcode for AVX512 integer comparisons all the way to iselCraig Topper2018-06-207-166/+264
| | | | | | | | | | I don't believe there is any real reason to have separate X86 specific opcodes for vector compares. Setcc has the same behavior just uses a different encoding for the condition code. I had to change the CondCodeAction for SETLT and SETLE to prevent some transforms from changing SETGT lowering. Differential Revision: https://reviews.llvm.org/D43608 llvm-svn: 335173
* [mips] Correct predicates for loads, bit manipulation instructions and some ↵Simon Dardis2018-06-204-22/+33
| | | | | | | | | | | | pseudos Additionally, correct the definition of the rdhwr instruction. Reviewers: atanasyan, abeserminji, smaksimovic Differential Revision: https://reviews.llvm.org/D48216 llvm-svn: 335162
* AMDGPU: Fix scalar_to_vector for v4i16/v4f16Matt Arsenault2018-06-202-3/+12
| | | | llvm-svn: 335161
* AMDGPU: Fix missing C++ mode commentMatt Arsenault2018-06-201-1/+1
| | | | llvm-svn: 335160
OpenPOWER on IntegriCloud