summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Allow base and index for gather instructions to appear in other order ↵Craig Topper2018-06-251-0/+11
| | | | | | for Intel syntax. llvm-svn: 335500
* Add Triple::isMIPS()/isMIPS32()/isMIPS64(). NFCAlexander Richardson2018-06-254-13/+6
| | | | | | | | | | | | | | There are quite a few if statements that enumerate all these cases. It gets even worse in our fork of LLVM where we also have a Triple::cheri (which is mips64 + CHERI instructions) and we had to update all if statements that check for Triple::mips64 to also handle Triple::cheri. This patch helps to reduce our diff to upstream and should also make some checks more readable. Reviewed By: atanasyan Differential Revision: https://reviews.llvm.org/D48548 llvm-svn: 335493
* AMDGPU/GlobalISel: Add support for llvm.amdgcn.kernarg.segment.ptrMatt Arsenault2018-06-253-1/+29
| | | | | | | | | Note a normal select test is not currently possible because this relies on input registers tracked in SIMachineFunctionInfo which are not currently serializable in MIR, but this does work end-to-end from the IR. llvm-svn: 335490
* AMDGPU: Remove commented out codeMatt Arsenault2018-06-251-2/+0
| | | | llvm-svn: 335486
* AMDGPU/GlobalISel: Fix G_IMPLICIT_DEF for pointersMatt Arsenault2018-06-251-1/+5
| | | | llvm-svn: 335485
* AMDGPU: Respect align argument parameterMatt Arsenault2018-06-252-10/+19
| | | | | | | | | | This should avoid relying on the pointee type to get the alignment, particularly since pointee types are supposed to be removed at some point. Also fixes not getting the alignment for unsized types. llvm-svn: 335478
* [X86] Block commuting operand 1 of FMA*_Int instructions in ↵Craig Topper2018-06-252-88/+47
| | | | | | | | | | | | findThreeSrcCommutedOpIndices. Remove uncommutable returns from getThreeSrcCommuteCase/getFMA3OpcodeToCommuteOperands. We should be blocking the operand while we are in the routine that tries to find commutable operand indices. Doing it later means we might have missed out on another valid set of operands we could have commuted. The intrinsic case was the only case that could really prevent commuting in getFMA3OpcodeToCommuteOperands. All the other cases in getThreeSrcCommuteCase were not reachable conditions as they were protected by findThreeSrcCommutedOpIndices. With that abort case pushed earlier, we can remove all the abort checks and replace with asserts. llvm-svn: 335446
* [WebAssembly] Add WebAssemblyException information analysisHeejin Ahn2018-06-255-0/+370
| | | | | | | | | | | | | | | | | Summary: A WebAssemblyException object contains BBs that belong to a 'catch' part of the try-catch-end structure. Because CFGSort requires all the BBs within a catch part to be sorted together as it does for loops, this pass calculates the nesting structure of catch part of exceptions in a function. Now this assumes the use of Windows EH instructions. Reviewers: dschuff, majnemer Subscribers: jfb, mgorny, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D44134 llvm-svn: 335439
* [WebAssembly] Add WebAssemblyLateEHPrepare passHeejin Ahn2018-06-255-93/+388
| | | | | | | | | | | | | | | | Summary: Add WebAssemblyLateEHPrepare pass that does several small jobs for exception handling. This runs before CFGSort, and is different from WasmEHPrepare pass that runs before ISel, even though the names are similar. Reviewers: dschuff, majnemer Subscribers: sbc100, jgravelle-google, sunfish, mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D46803 llvm-svn: 335438
* [X86] Simplify some code by using isOneConstant. NFCCraig Topper2018-06-251-2/+1
| | | | llvm-svn: 335437
* [X86] Remove the changes to combineScalarToVector made in r335037.Craig Topper2018-06-251-26/+3
| | | | | | They appear to be untested other than the test case for p37879.ll and I believe we should be using SimplifyDemandedElts here to handle these cases. llvm-svn: 335436
* [X86] Reduce the number of patterns needed for masked scalar ceil/floor isel.Craig Topper2018-06-251-35/+10
| | | | | | The scalar to vector on the mask register should not be part of the patterns. llvm-svn: 335435
* [mips][ias] Enable IAS by default for OpenBSD / FreeBSD mips64/mips64el.Brad Smith2018-06-241-0/+5
| | | | | | | | Reviewers: atanasyan Differential Review: https://reviews.llvm.org/D31557 llvm-svn: 335434
* [X86] Regroup some isel patterns. NFCCraig Topper2018-06-241-32/+22
| | | | | | For some reason the 64-bit patterns were separated from their 8/16/32-bit friends, but only for add/sub/mul. For and/or/xor they were together. llvm-svn: 335429
* [X86] Rename VFPCLASSSS and VFPCLASSSD internal instruction names to include ↵Craig Topper2018-06-243-14/+14
| | | | | | a Z to match other EVEX instructions. llvm-svn: 335428
* [X86] Make %eiz usage in 64-bit mode, force a 0x67 address size prefix. Fix ↵Craig Topper2018-06-231-0/+2
| | | | | | some test CHECK lines. llvm-svn: 335414
* [X86] Teach disassembler to use %eip instead of %rip when 0x67 prefix is ↵Craig Topper2018-06-231-1/+3
| | | | | | used on a rip-relative address. llvm-svn: 335413
* [X86][AsmParser] Improve base/index register checks.Craig Topper2018-06-231-8/+29
| | | | | | | | | -Ensure EIP isn't used with an index reigster. -Ensure EIP isn't used as index register. -Ensure base register isn't a vector register. -Ensure eiz/riz usage matches the size of their base register. llvm-svn: 335412
* [AMDGPU] Update includes for intrinsic changes :(Reid Kleckner2018-06-232-4/+4
| | | | llvm-svn: 335409
* [IR] Split Intrinsics.inc into enums and implementationsReid Kleckner2018-06-231-1/+2
| | | | | | | | | | | | | | | | | | | Implements PR34259 Intrinsics.h is a very popular header. Most LLVM TUs care about things like dbg_value, but they don't care how they are implemented. After I split these out, IntrinsicImpl.inc is 1.7 MB, so this saves each LLVM TU from scanning 1.7 MB of source that gets pre-processed away. It also means we can modify intrinsic properties without triggering a full rebuild, but that's probably less of a win. I think the next best thing to do would be to split out the target intrinsics into their own header. Very, very few TUs care about target-specific intrinsics. It's very hard to split up the target independent intrinsics like llvm.expect, assume, and dbg.value, though. llvm-svn: 335407
* [X86][AsmParser] Rework that allows (%dx) to be used in place of %dx with ↵Craig Topper2018-06-231-41/+29
| | | | | | | | | | in/out instructions. Previously, to support (%dx) we left a wide open hole in our 16-bit memory address checking. This let this address value be used with any instruction without error in the parser. It would later fail in the encoder with an assertion failure on debug builds and who knows what on release builds. This patch passes the mnemonic down to the memory operand parsing function so we can allow the (%dx) form only on specific instructions. llvm-svn: 335403
* [X86][AsmParser] Keep track of whether an explicit scale was specified while ↵Craig Topper2018-06-221-8/+16
| | | | | | | | | | parsing an address in Intel syntax. Use it for improved error checking. This allows us to check these: -16-bit addressing doesn't support scale so we should error if we find one there. -Multiplying ESP/RSP by a scale even if the scale is 1 should be an error because ESP/RSP can't be an index. llvm-svn: 335398
* [X86][AsmParser] In Intel syntax make sure we support ESP/RSP being the ↵Craig Topper2018-06-221-0/+4
| | | | | | | | | | second register in memory expressions like [EAX+ESP]. By default, the second register gets assigned to the index register slot. But ESP can't be an index register so we need to swap it with the other register. There's still a slight bug that we allow [EAX+ESP*1]. The existence of the multiply even though its with 1 should force ESP to the index register and trigger an error, but it doesn't currently. llvm-svn: 335394
* [X86] Don't accept (%si,%bp) 16-bit address expressions.Craig Topper2018-06-221-4/+9
| | | | | | | | | | The second register is the index register and should only be %si or %di if used with a base register. And in that case the base register should be %bp or %bx. This makes us compatible with gas. We do still need to support both orders with Intel syntax which uses [bp+si] and [si+bp] llvm-svn: 335384
* [X86][AsmParser] Allow (%bp,%si) and (%bp,%di) to be encoded without using a ↵Craig Topper2018-06-221-1/+1
| | | | | | | | zero displacement. (%bp) can't be encoded without a displacement. The encoding is instead used for displacement alone. So a 1 byte displacement of 0 must be used. But if there is an index register we can encode without a displacement. llvm-svn: 335379
* [X86][AsmParser] Check for invalid 16-bit base register in Intel syntax.Craig Topper2018-06-221-19/+24
| | | | llvm-svn: 335373
* [X86] Don't allow ESP/RSP to be used as an index register in assembly.Craig Topper2018-06-221-1/+2
| | | | | | Fixes PR37892 llvm-svn: 335370
* Recommit of r335326, with the test fixed that I missed.Sjoerd Meijer2018-06-221-3/+6
| | | | llvm-svn: 335331
* [CostModel][AArch64] Add some initial costs for SK_Select and ↵Simon Pilgrim2018-06-221-17/+32
| | | | | | | | | | | | | | SK_PermuteSingleSrc AArch64 was only setting costs for SK_Transpose, which meant that many of the simpler shuffles (e.g. SK_Select and SK_PermuteSingleSrc for larger vector elements) was being severely overestimated by the default shuffle expansion. This patch adds costs to help improve SLP performance and avoid a regression in reductions introduced by D48174. I'm not very knowledgeable about AArch64 shuffle lowering so I've kept the extra costs to a minimum - someone who knows this code can add extra costs which should improve vectorization a lot more. Differential Revision: https://reviews.llvm.org/D48172 llvm-svn: 335329
* Reverting r335326 while I look at the test failureSjoerd Meijer2018-06-221-6/+3
| | | | llvm-svn: 335328
* [ARM] ARMv6m and v8m.baseline strict alignSjoerd Meijer2018-06-221-3/+6
| | | | | | | | | | | | This sets target feature FeatureStrictAlign for Armv6-m and Armv8-m.baseline, because it has no support for unaligned accesses. It looks like we always pass target feature "+strict-align" from Clang, so this is not a user facing problem, but querying the subtarget (in e.g. llc) for unaligned access support is incorrect. Differential Revision: https://reviews.llvm.org/D48437 llvm-svn: 335326
* AMDGPU: Add patterns for i32/i64 local atomic load/storeMatt Arsenault2018-06-224-1/+54
| | | | | | | | Not sure why the 32/64 split is needed in the atomic_load store hierarchies. The regular PatFrags do this, but we don't do it for the existing handling for global. llvm-svn: 335325
* [X86] Changing the check for valid inputs in combineScalarToVectorMikhail Dvoretckii2018-06-221-5/+6
| | | | | | | | | Changing the logic of scalar mask folding to check for valid input types rather than against invalid ones, making it more robust and fixing PR37879. Differential Revision: https://reviews.llvm.org/D48366 llvm-svn: 335323
* AMDGPU/GlobalISel: Default to using TableGen'd instruction selectorTom Stellard2018-06-221-7/+0
| | | | | | | | | | | | | | | | Summary: We can select all instructions that are marked as legal in a full piglit run, so now is a good time to make the TableGen'd instruction selector default for all opcodes. This is NFC for a full piglit run, which is why there are no tests. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48198 llvm-svn: 335319
* AMDGPU/GlobalISel: legalize and select 32-bit G_ASHRTom Stellard2018-06-224-0/+47
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D48196 llvm-svn: 335318
* AMDGPU/GlobalISel: legalize and select 32-bit G_SITOFPTom Stellard2018-06-224-0/+18
| | | | | | | | | | | | Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48195 llvm-svn: 335316
* AMDGPU/GlobalISel: Implement select() for COPYTom Stellard2018-06-221-1/+4
| | | | | | | | | | | | Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46151 llvm-svn: 335315
* AMDGPU/GlobalISel: Implement select() for G_IMPLICIT_DEFTom Stellard2018-06-212-0/+16
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46150 llvm-svn: 335307
* Revert r335297 "[X86] Implement more of x86-64 large and medium PIC code models"Reid Kleckner2018-06-216-115/+29
| | | | | | MCJIT can't handle R_X86_64_GOT64 yet. llvm-svn: 335300
* [X86] Commit some comments that weren't in the medium code model patchReid Kleckner2018-06-211-4/+4
| | | | llvm-svn: 335298
* [X86] Implement more of x86-64 large and medium PIC code modelsReid Kleckner2018-06-216-27/+113
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The large code model allows code and data segments to exceed 2GB, which means that some symbol references may require a displacement that cannot be encoded as a displacement from RIP. The large PIC model even relaxes the assumption that the GOT itself is within 2GB of all code. Therefore, we need a special code sequence to materialize it: .LtmpN: leaq .LtmpN(%rip), %rbx movabsq $_GLOBAL_OFFSET_TABLE_-.LtmpN, %rax # Scratch addq %rax, %rbx # GOT base reg From that, non-local references go through the GOT base register instead of being PC-relative loads. Local references typically use GOTOFF symbols, like this: movq extern_gv@GOT(%rbx), %rax movq local_gv@GOTOFF(%rbx), %rax All calls end up being indirect: movabsq $local_fn@GOTOFF, %rax addq %rbx, %rax callq *%rax The medium code model retains the assumption that the code segment is less than 2GB, so calls are once again direct, and the RIP-relative loads can be used to access the GOT. Materializing the GOT is easy: leaq _GLOBAL_OFFSET_TABLE_(%rip), %rbx # GOT base reg DSO local data accesses will use it: movq local_gv@GOTOFF(%rbx), %rax Non-local data accesses will use RIP-relative addressing, which means we may not always need to materialize the GOT base: movq extern_gv@GOTPCREL(%rip), %rax Direct calls are basically the same as they are in the small code model: They use direct, PC-relative addressing, and the PLT is used for calls to non-local functions. This patch adds reasonably comprehensive testing of LEA, but there are lots of interesting folding opportunities that are unimplemented. Reviewers: chandlerc, echristo Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D47211 llvm-svn: 335297
* AMDGPU: Remove ability to reserve VGPRs for debuggerKonstantin Zhuravlyov2018-06-216-50/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D48234 llvm-svn: 335288
* [AMDGPU] Update assembler for HSA Code Object v3Scott Linder2018-06-216-75/+698
| | | | | | | | | | | | | | Update AMDGPU assembler syntax behind the code-object-v3 feature: * Replace/rename most AMDGPU assembler directives/symbols and document them. * Provide more diagnostics (e.g. values out of range, missing values, repeated values). * Provide path for backwards compatibility, even with underlying descriptor changes. Differential Revision: https://reviews.llvm.org/D47736 llvm-svn: 335281
* [mips] Modify comment to test new email address (NFC).Simon Dardis2018-06-211-1/+1
| | | | llvm-svn: 335269
* [AMDGPU] Fix bug with tracking processed blocks in SIInsertWaitcntsScott Linder2018-06-211-0/+1
| | | | | | | | | | BlockWaitcntProcessedSet was not being cleared between calls, so it was producing incorrect counts in cases where MBB addresses happened to coincide across multiple calls. Differential Revision: https://reviews.llvm.org/D48391 llvm-svn: 335268
* AMDGPU/AMDHSA: Remove GridWorkGroupCountX/Y/ZKonstantin Zhuravlyov2018-06-215-51/+0
| | | | | | | | | | | | and everything that comes with it from implementation and v3 header files. Leave definition in v2 header files for backwards compatibility. Differential Revision: https://reviews.llvm.org/D48191 llvm-svn: 335267
* Revert "[AArch64] Coalesce Copy Zero during instruction selection"Sirish Pande2018-06-211-29/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit d8f57105010cc7e78026e511d5def873fc91e0e7. Original Commit: Author: Haicheng Wu <haicheng@codeaurora.org> Date: Sun Feb 18 13:51:33 2018 +0000 [AArch64] Coalesce Copy Zero during instruction selection Add special case for copy of zero to avoid a double copy. Differential Revision: https://reviews.llvm.org/D36104 Author's intention is to remove a BB that has one mov instruction. In order to do that, d8f571050 pessmizes MachineSinking by introducing a copy, such that mov instruction is NOT moved to the BB. Optimization downstream gets rid of the BB with only mov instruction. This works well if we have only one fall through branch as there is only one "extra" mov instruction. If we have multiple fall throughs, we will have a lot of redundant movs. In such a case, it's better to have this BB which has one mov instruction. This is causing degradation in jpeg, fft and other codebases. I believe if we want to remove a BB with only one branch instruction, we should not pessimize Machine Sinking at all, and find some other solution. llvm-svn: 335251
* [ARM] Enable useAA() for the in-order Cortex-R52David Green2018-06-212-1/+13
| | | | | | | | | | | | This option allows codegen (such as DAGCombine or MI scheduling) to use alias analysis information, which can help with the codegen on in-order cpu's, especially machine scheduling. Here I have done things the same way as AArch64, adding a subtarget feature to enable this for specific cores, and enabled it for the R52 where we have a schedule to make use of it. Differential Revision: https://reviews.llvm.org/D48074 llvm-svn: 335249
* [RISCV] Tail calls don't need to save return addressSameer AbuAsal2018-06-211-2/+6
| | | | | | | | | | | | | | | | | | | | | | Summary: When expanding the PseudoTail in expandFunctionCall() we were using X6 to save the return address. Since this is a tail call the return address is not needed, this patch replaces it with X0 to be ignored. This matches the behaviour listed in the ISA V2.2 document page 110. tail offset -----> jalr x0, x6, offset GCC exhibits the same behavior. Reviewers: apazos, asb, mgrang Reviewed By: asb Subscribers: rbar, johnrusso, simoncook, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, rogfer01 Differential Revision: https://reviews.llvm.org/D48343 llvm-svn: 335239
* [x86] Lower some trunc + shuffle patterns to vpmov[q|d][b|w]Mikhail Dvoretckii2018-06-211-7/+118
| | | | | | | | | | | | This should help in lowering the following four intrinsics: _mm256_cvtepi32_epi8 _mm256_cvtepi64_epi16 _mm256_cvtepi64_epi8 _mm512_cvtepi64_epi8 Differential Revision: https://reviews.llvm.org/D46957 llvm-svn: 335238
OpenPOWER on IntegriCloud