summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Remove ProcModel and ProcFeatures tablegen classes. Move all feature ↵Craig Topper2019-03-111-392/+380
| | | | | | | | | | | | | | | | | | lists into a ProcessorFeatures class. ProcFeatures was a class that just concatenated two feature lists together and gave it a name. We used it to inherit features between CPUs. ProcModel took a two CPU feature lists and concatenated them before deferring to ProcessorModel. This was to allow inherited features and specific features to be passed to each CPU. Both of these allowed for only very rigid CPU inheritance rules. With this patch we now store all of the lists we were using for inheritance in one object and do any list oncatenation we want there. Then we just pass whatever list we want from this class into the ProcessorModel class for each CPU. Hopefully this gives us more flexibility to build up feature lists in whatever ways we think make sense. Perhaps untangling ISA flags and tuning flags. I've only touched the CPUs that were directly affected by the removal of the ProcModel and ProcFeatures classes. We should move more of the feature lists into ProcessorFeatures. llvm-svn: 355872
* Recommit "[GlobalISel][AArch64] Add selection support for G_EXTRACT_VECTOR_ELT"Jessica Paquette2019-03-113-18/+146
| | | | | | | | | After r355865, we should be able to safely select G_EXTRACT_VECTOR_ELT without running into any problematic intrinsics. Also add a fix for lane copies, which don't support index 0. llvm-svn: 355871
* Remove ASan asm instrumentation.Evgeniy Stepanov2019-03-114-1165/+1
| | | | | | | | | | | | | | Summary: It is incomplete and has no users AFAIK. Reviewers: pcc, vitalybuka Subscribers: srhines, kubamracek, mgorny, krytarowski, eraman, hiraditya, jdoerfert, #sanitizers, llvm-commits, thakis Tags: #sanitizers, #llvm Differential Revision: https://reviews.llvm.org/D59154 llvm-svn: 355870
* [RISCV] Do a sign-extension in a compare-and-swap of 32 bit in RV64AAlex Bradbury2019-03-111-0/+4
| | | | | | | | | | | | | | | | AtomicCmpSwapWithSuccess is legalised into an AtomicCmpSwap plus a comparison. This requires an extension of the value which, by default, is a zero-extension. When we later lower AtomicCmpSwap into a PseudoCmpXchg32 and then expanded in RISCVExpandPseudoInsts.cpp, the lr.w instruction does a sign-extension. This mismatch of extensions causes the comparison to fail when the compared value is negative. This change overrides TargetLowering::getExtendForAtomicOps for RISC-V so it does a sign-extension instead. Differential Revision: https://reviews.llvm.org/D58829 Patch by Ferran Pallarès Roca. llvm-svn: 355869
* [RISCV] Allow fp as an alias of s0Alex Bradbury2019-03-111-1/+1
| | | | | | | | | | | The RISC-V Assembly Programmer's Manual defines fp as another alias of x8. However, our tablegen rules only recognise s0. This patch adds fp as another alias of x8. GCC also accepts fp. Differential Revision: https://reviews.llvm.org/D59209 Patch by Ferran Pallarès Roca. llvm-svn: 355867
* [GlobalISel][AArch64] Always fall back on aarch64.neon.addp.*Jessica Paquette2019-03-112-0/+27
| | | | | | | | | | | | | | Overloaded intrinsics aren't necessarily safe for instruction selection. One such intrinsic is aarch64.neon.addp.*. This is a temporary workaround to ensure that we always fall back on that intrinsic. Eventually this will be replaced with a proper solution. https://bugs.llvm.org/show_bug.cgi?id=40968 Differential Revision: https://reviews.llvm.org/D59062 llvm-svn: 355865
* [RISCV][NFC] Convert some MachineBaiscBlock::iterator(MI) to MI.getIterator()Alex Bradbury2019-03-112-4/+4
| | | | llvm-svn: 355864
* [SDAG][AArch64] Legalize VECREDUCENikita Popov2019-03-111-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes https://bugs.llvm.org/show_bug.cgi?id=36796. Implement basic legalizations (PromoteIntRes, PromoteIntOp, ExpandIntRes, ScalarizeVecOp, WidenVecOp) for VECREDUCE opcodes. There are more legalizations missing (esp float legalizations), but there's no way to test them right now, so I'm not adding them. This also includes a few more changes to make this work somewhat reasonably: * Add support for expanding VECREDUCE in SDAG. Usually experimental.vector.reduce is expanded prior to codegen, but if the target does have native vector reduce, it may of course still be necessary to expand due to legalization issues. This uses a shuffle reduction if possible, followed by a naive scalar reduction. * Allow the result type of integer VECREDUCE to be larger than the vector element type. For example we need to be able to reduce a v8i8 into an (nominally) i32 result type on AArch64. * Use the vector operand type rather than the scalar result type to determine the action, so we can control exactly which vector types are supported. Also change the legalize vector op code to handle operations that only have vector operands, but no vector results, as is the case for VECREDUCE. * Default VECREDUCE to Expand. On AArch64 (only target using VECREDUCE), explicitly specify for which vector types the reductions are supported. This does not handle anything related to VECREDUCE_STRICT_*. Differential Revision: https://reviews.llvm.org/D58015 llvm-svn: 355860
* [NFC][PowerPC] Add comment for PPCAsmPrinter::printOperandJinsong Ji2019-03-111-2/+6
| | | | | | Patch by Yi-Hong Lyu llvm-svn: 355848
* Use bitset for assembler predicatesStanislav Mekhanoshin2019-03-1117-137/+161
| | | | | | | | | | | | | | AMDGPU target run out of Subtarget feature flags hitting the limit of 64. AssemblerPredicates uses at most uint64_t for their representation. At the same time CodeGen has exhausted this a long time ago and switched to a FeatureBitset with the current limit of 192 bits. This patch completes transition to the bitset for feature bits extending it to asm matcher and MC code emitter. Differential Revision: https://reviews.llvm.org/D59002 llvm-svn: 355839
* [AMDGPU] Mark enum types in SIDefines.h as unsignedStanislav Mekhanoshin2019-03-114-21/+21
| | | | | | | | MSVC issues some warnings about signed/unsigned comparison. Differential Revision: https://reviews.llvm.org/D59171 llvm-svn: 355836
* [MIPS][microMIPS] Add a pattern to match TruncIntFPPetar Jovanovic2019-03-112-1/+8
| | | | | | | | | | | A pattern needed to match TruncIntFP was missing. This was causing multiple tests from llvm test suite to fail during compilation for micromips. Patch by Mirko Brkusanin. Differential Revision: https://reviews.llvm.org/D58722 llvm-svn: 355825
* [MIPS GlobalISel] NarrowScalar G_UMULHPetar Avramovic2019-03-111-1/+2
| | | | | | | | | | NarrowScalar G_UMULH in LegalizerHelper using multiplyRegisters helper function. NarrowScalar G_UMULH for MIPS32. Differential Revision: https://reviews.llvm.org/D58825 llvm-svn: 355815
* [MIPS GlobalISel] NarrowScalar G_MULPetar Avramovic2019-03-111-5/+1
| | | | | | | | | | | | | Narrow Scalar G_MUL for MIPS32. Revisit NarrowScalar implementation in LegalizerHelper. Introduce new helper function multiplyRegisters. It performs generic multiplication of values held in multiple registers. Generated instructions use only types NarrowTy and i1. Destination can be same or two times size of the source. Differential Revision: https://reviews.llvm.org/D58824 llvm-svn: 355814
* [X86] Enable sse2_cvtsd2ss intrinsic to use an EVEX encoded instruction.Craig Topper2019-03-112-8/+9
| | | | llvm-svn: 355810
* [X86] Remove apparently unneeded patterns for storing a bitcasted ↵Craig Topper2019-03-111-12/+0
| | | | | | | | extractelement. I suspect if this pattern was seen, DAG combine would just change the type of the store to eliminate the bitcast. llvm-svn: 355809
* [X86] Use 'UseAVX' in place of 'HasAVX, NoAVX512'. NFCCraig Topper2019-03-111-1/+1
| | | | | | They mean the same thing, but 'HasAVX, NoAVX512' only appears in this one place. Every other place uses UseAVX. llvm-svn: 355808
* [X86] Add SCALAR_SINT_TO_FP/SCALAR_UINT_TO_FP ISD opcodes without rounding mode.Craig Topper2019-03-115-22/+29
| | | | | | After this we no longer need to match FROUND_CURRENT or FROUND_NO_EXC during isel so I remove those. llvm-svn: 355807
* [X86] Split SCALEF(S) ISD opcodes into a version without rounding mode.Craig Topper2019-03-115-66/+56
| | | | llvm-svn: 355806
* [X86] Split RCP28/RSQRT/GETEXP/EXP2 ISD opcodes into SAE and current ↵Craig Topper2019-03-115-98/+99
| | | | | | direction nodes. Remove rounding mode operand. llvm-svn: 355805
* [X86] Rename _RND versions of RANGE/REDUCE/GETMANT/RDNSCALE ISD opcodes to ↵Craig Topper2019-03-115-155/+124
| | | | | | | | _SAE. Remove SAE operand. No need to explicitly store it and match it during isel. llvm-svn: 355804
* [X86] Rename X86ISD::CVTPH2PS_RND to CVTPH2PS_SAE. Remove SAE operand.Craig Topper2019-03-115-10/+8
| | | | llvm-svn: 355803
* [X86] Rename the CVTT*_RND ISD nodes to _SAE and remove the SAE operand. ↵Craig Topper2019-03-115-101/+158
| | | | | | | | Split VFPROUNDS_RND/VFPEXT(S)_RND into versions without rounding operand. For VFPEXT(S) we only need current rounding mode and an SAE version. Neither need extra operand. llvm-svn: 355802
* [X86] Rename X86ISD::CMPM_RND and X86ISD::FSETCCM_RND to _SAE instead of ↵Craig Topper2019-03-116-35/+22
| | | | | | | | _RND. Remove rounding operand. The operand could only be the SAE encoding so no need to include it. llvm-svn: 355801
* [X86] Split the VFIXUPIMM/VFIXUPIMMS nodes into a current rounding mode and ↵Craig Topper2019-03-115-92/+84
| | | | | | | | SAE ISD opcode. Remove matching of FROUND_CURRENT and FROUND_NO_EXC for these nodes from isel table. llvm-svn: 355800
* [X86] Begin removing matching of FROUND_CURRENT and FROUND_NO_EXC from isel ↵Craig Topper2019-03-115-74/+111
| | | | | | | | | | tables. Instead I plan to have dedicated nodes for FROUND_CURRENT and FROUND_NO_EXC. This patch starts with FADDS/FSUBS/FMULS/FDIVS/FMAXS/FMINS/FSQRTS. llvm-svn: 355799
* [PowerPC] Remove the override of isMachineVerifierClean() to open machine ↵Zi Xuan Wu2019-03-111-4/+0
| | | | | | | | | | | | | | | | verifier After fix all asserts found by machine verifier in PowerPC target with following patches, we can activate machine verifier as default. rL293769, rL348566, rL349030, rL349029, rL350113, rL350111, rL350799, rL350165, rL355378, rL352174, rL354762, rL350115 It's also found in PR#27456, https://bugs.llvm.org/show_bug.cgi?id=27456 Differential Revision: https://reviews.llvm.org/D59011 llvm-svn: 355798
* [X86] Remove unneeded isel patterns from VCVTSI2SDZ and VCVTUSI2SDZ. NFCCraig Topper2019-03-111-3/+3
| | | | | | | | We had patterns using X86ISD::SCALAR_SINT_TO_FP_RND/SCALAR_UINT_TO_FP_RND for these instructions. There's nothing to round. Instead, we use a regular sint_to_fp/uint_to_fp and a movsd as the pattern for these. llvm-svn: 355796
* [X86] Remove VCVTSI2SDZrrb_Int as it shouldn't exist.Craig Topper2019-03-112-2/+1
| | | | | | This would convert a signed 32-bit integer to double precision with rounding. But there's nothing to round. llvm-svn: 355795
* [x86] add x86-specific opcodes to extractelement scalarization listSanjay Patel2019-03-101-4/+8
| | | | llvm-svn: 355792
* [X86] Remove unused variable. NFCCraig Topper2019-03-101-1/+0
| | | | llvm-svn: 355790
* [X86] Make lowering of intrinsics with rounding mode stricter so that only ↵Craig Topper2019-03-102-32/+68
| | | | | | | | | | valid rounding modes are lowered. Update tests accordingly Many of our tests were not using valid rounding mode immediates. Clang verifies this in the frontend when it creates the intrinsics from builtins, but the backend would still lower invalid immediates. With this change we will now leave them as intrinsics if the immediate is invalid. This will cause an isel selection failure. llvm-svn: 355789
* [X86] Remove dead code from the handler for INTR_TYPE_SCALAR_MASK_RM.Craig Topper2019-03-101-13/+3
| | | | | | The code in here handles nodes with 6 or 7 operands. But only the 6 operand case is ever used these days. llvm-svn: 355788
* Recommit r355224 "[TableGen][SelectionDAG][X86] Add specific isel matchers ↵Craig Topper2019-03-104-59/+53
| | | | | | | | | | | | | | | | | | for immAllZerosV/immAllOnesV. Remove bitcasts from X86 patterns that are no longer necessary." Includes a fix to emit a CheckOpcode for build_vector when immAllZerosV/immAllOnesV is used as a pattern root. This means it can't be used to look through bitcasts when used as a root, but that's probably ok. This extra CheckOpcode will ensure that the first match in the isel table will be a SwitchOpcode which is needed by the caching optimization in the ISel Matcher. Original commit message: Previously we had build_vector PatFrags that called ISD::isBuildVectorAllZeros/Ones. Internally the ISD::isBuildVectorAllZeros/Ones look through bitcasts, but we aren't able to take advantage of that in isel. Instead of we have to canonicalize the types of the all zeros/ones build_vectors and insert bitcasts. Then we have to pattern match those exact bitcasts. By emitting specific matchers for these 2 nodes, we can make isel look through any bitcasts without needing to explicitly match them. We should also be able to remove the canonicalization to vXi32 from lowering, but I've left that for a follow up. This removes something like 40,000 bytes from the X86 isel table. Differential Revision: https://reviews.llvm.org/D58595 llvm-svn: 355784
* [RISCV][NFC] Minor refactoring of CC_RISCVAlex Bradbury2019-03-091-7/+7
| | | | | | | Immediately check if we need to early-exit as we have a return value that can't be returned directly. Also tweak following if/else. llvm-svn: 355773
* [RISCV][NFC] Split out emitSelectPseudo from EmitInstrWithCustomInserterAlex Bradbury2019-03-091-16/+19
| | | | | | It's cleaner and more consistent to have a separate helper function here. llvm-svn: 355772
* [RISCV] Support -target-abi at the MC layer and for codegenAlex Bradbury2019-03-0910-15/+131
| | | | | | | | | | | | | | | | This patch adds proper handling of -target-abi, as accepted by llvm-mc and llc. Lowering (codegen) for the hard-float ABIs will follow in a subsequent patch. However, this patch does add MC layer support for the hard float and RVE ABIs (emission of the appropriate ELF flags https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md#-file-header). ABI parsing must be shared between codegen and the MC layer, so we add computeTargetABI to RISCVUtils. A warning will be printed if an invalid or unrecognized ABI is given. Differential Revision: https://reviews.llvm.org/D59023 llvm-svn: 355771
* [WebAssembly] Use named operands to identify loads and storesThomas Lively2019-03-098-135/+42
| | | | | | | | | | | | | | | | | | | Summary: Uses the named operands tablegen feature to look up the indices of offset, address, and p2align operands for all load and store instructions. This replaces brittle, incorrect logic for identifying loads and store when eliminating frame indices, which previously crashed on bulk-memory ops. It also cleans up the SetP2Alignment pass. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59007 llvm-svn: 355770
* [RISCV] Allow access to FP CSRs without F extensionAna Pazos2019-03-081-2/+0
| | | | | | | | | | | | | | | | | | Summary: Floating-point CSRs should be accessible even when F extension is not enabled. But pseudo instructions that access floating point CSRs still require the F extension. GNU tools already implement this behavior. RISC-V spec is pending update to reflect this behavior and to extend it to pseudo instructions that access floating point CSRs. Reviewers: asb Reviewed By: asb Subscribers: asb, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, llvm-commits Differential Revision: https://reviews.llvm.org/D58932 llvm-svn: 355753
* [AArch64][GlobalISel] Fix i1 arguments not being zero-extended as required ↵Amara Emerson2019-03-081-0/+3
| | | | | | | | by ABI. Fixes PR41001. llvm-svn: 355745
* [x86] scalarize extract element 0 of FP cmpSanjay Patel2019-03-081-0/+16
| | | | | | | | | | | | | An extension of D58282 noted in PR39665: https://bugs.llvm.org/show_bug.cgi?id=39665 This doesn't answer the request to use movmsk, but that's an independent problem. We need this and probably still need scalarization of FP selects because we can't do that as a target-independent transform (although it seems likely that targets besides x86 should have this transform). llvm-svn: 355741
* [NVPTX][DEBUGINFO]Temp workaround for crash of ptxas: disable packed bytes ↵Alexey Bataev2019-03-081-0/+6
| | | | | | | | | | | | | | | | | | | in debug sections. Summary: This patch works around the bug in the ptxas tool with the processing of bytes separated by the comma symbol. The emission of the packed string is temporarily disabled. Reviewers: tra Subscribers: jholewinski, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59148 llvm-svn: 355740
* [HWASan] Save + print registers when tag mismatch occurs in AArch64.Mitch Phillips2019-03-081-2/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This change change the instrumentation to allow users to view the registers at the point at which tag mismatch occured. Most of the heavy lifting is done in the runtime library, where we save the registers to the stack and emit unwind information. This allows us to reduce the overhead, as very little additional work needs to be done in each __hwasan_check instance. In this implementation, the fast path of __hwasan_check is unmodified. There are an additional 4 instructions (16B) emitted in the slow path in every __hwasan_check instance. This may increase binary size somewhat, but as most of the work is done in the runtime library, it's manageable. The failure trace now contains a list of registers at the point of which the failure occured, in a format similar to that of Android's tombstones. It currently has the following format: Registers where the failure occurred (pc 0x0055555561b4): x0 0000000000000014 x1 0000007ffffff6c0 x2 1100007ffffff6d0 x3 12000056ffffe025 x4 0000007fff800000 x5 0000000000000014 x6 0000007fff800000 x7 0000000000000001 x8 12000056ffffe020 x9 0200007700000000 x10 0200007700000000 x11 0000000000000000 x12 0000007fffffdde0 x13 0000000000000000 x14 02b65b01f7a97490 x15 0000000000000000 x16 0000007fb77376b8 x17 0000000000000012 x18 0000007fb7ed6000 x19 0000005555556078 x20 0000007ffffff768 x21 0000007ffffff778 x22 0000000000000001 x23 0000000000000000 x24 0000000000000000 x25 0000000000000000 x26 0000000000000000 x27 0000000000000000 x28 0000000000000000 x29 0000007ffffff6f0 x30 00000055555561b4 ... and prints after the dump of memory tags around the buggy address. Every register is saved exactly as it was at the point where the tag mismatch occurs, with the exception of x16/x17. These registers are used in the tag mismatch calculation as scratch registers during __hwasan_check, and cannot be saved without affecting the fast path. As these registers are designated as scratch registers for linking, there should be no important information in them that could aid in debugging. Reviewers: pcc, eugenis Reviewed By: pcc, eugenis Subscribers: srhines, kubamracek, mgorny, javed.absar, krytarowski, kristof.beyls, hiraditya, jdoerfert, llvm-commits, #sanitizers Tags: #sanitizers, #llvm Differential Revision: https://reviews.llvm.org/D58857 llvm-svn: 355738
* AMDGPU: Move d16 load matching to preprocess stepMatt Arsenault2019-03-0810-195/+348
| | | | | | | | | | | | | | | | | When matching half of the build_vector to a load, there could still be a hidden dependency on the other half of the build_vector the pattern wouldn't detect. If there was an additional chain dependency on the other value, a cycle could be introduced. I don't think a tablegen pattern is capable of matching the necessary conditions, so move this into PreprocessISelDAG. Check isPredecessorOf for the other value to avoid a cycle. This has a warning that it's expensive, so this should probably be moved into an MI pass eventually that will have more freedom to reorder instructions to help match this. That is currently complicated by the lack of a computeKnownBits type mechanism for the selected function. llvm-svn: 355731
* DAG: Don't try to cluster loads with tied inputsMatt Arsenault2019-03-081-45/+0
| | | | | | | | | | | | | | | | | | | | | This avoids breaking possible value dependencies when sorting loads by offset. AMDGPU has some load instructions that write into the high or low bits of the destination register, and have a tied input for the other input bits. These can easily have the same base pointer, but be a swizzle so the high address load needs to come first. This was inserting glue forcing the opposite ordering, producing a cycle the InstrEmitter would assert on. It may be potentially expensive to look for the dependency between the other loads, so just skip any where this could happen. Fixes bug 40936 by reverting r351379, which added a hacky attempt to fix this by adding chains in this case, which I think was just working around broken glue before the InstrEmitter. The core of the patch is re-implementing the fix for that problem. llvm-svn: 355728
* AMDGPU: Don't bother checking the chain in areLoadsFromSameBasePtrMatt Arsenault2019-03-081-15/+0
| | | | | | | This is only called in contexts that are verifying the chain itself, and the query itself is only asking about the address. llvm-svn: 355723
* AMDGPU: Correct DS implementation of areLoadsFromSameBasePtrMatt Arsenault2019-03-081-4/+4
| | | | | | | | | This was checking the wrong operands for the base register and the offsets. The indexes are shifted by the number of output registers from the machine instruction definition, and the chain is moved to the end. llvm-svn: 355722
* [DEBUG_INFO][NVPTX]Emit empty .debug_loc section in presence of the debug ↵Alexey Bataev2019-03-081-1/+4
| | | | | | | | | | | | | | | | | | | | option. Summary: If the LLVM module shows that it has debug info, but the file is actually empty and the real debug info is not emitted, the ptxas tool emits error 'Debug information not found in presence of .target debug'. We need at leas one empty debug section to silence this message. Section `.debug_loc` is not emitted for PTX and we can emit empty `.debug_loc` section if `debug` option was emitted. Reviewers: tra Subscribers: jholewinski, aprantl, llvm-commits Differential Revision: https://reviews.llvm.org/D57250 llvm-svn: 355719
* [x86] prevent infinite looping from inverse shuffle transformsSanjay Patel2019-03-081-1/+6
| | | | llvm-svn: 355713
* [ARM][FIX] Fix vfmal.f16 and vfmsl.f16 operandDiogo N. Sampaio2019-03-082-10/+21
| | | | | | | | | | | | | | The indexed variant of vfmal.f16 and vfmsl.f16 instructions use the uppser bits of the indexed operand to store the index (1 bit for the double variant, 2 bits for the quad). This limits the usable registers to d0 - d7 or s0 - s15. This patch enforces this limitation. Differential Revision: https://reviews.llvm.org/D59021 llvm-svn: 355707
OpenPOWER on IntegriCloud