summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Improve lowering of v2i64 sign bit tests on pre-sse4.2 targetsCraig Topper2020-01-071-0/+13
| | | | | | Without sse4.2 a v2i64 setlt needs to expand into a pcmpgtd, pcmpeqd, 3 shuffles, and 2 logic ops. But if we're only interested in the sign bit of the i64 elements, we can just use one pcmpgtd and shuffle the odd elements to the even elements. Differential Revision: https://reviews.llvm.org/D72302
* [X86] Pull out repeated SrcVT.getVectorNumElements() call. NFCI.Simon Pilgrim2020-01-071-2/+2
|
* [X86] Standardize shuffle match/lowering function names. NFC.Simon Pilgrim2020-01-071-38/+39
| | | | We mainly use lowerShuffle*/matchShuffle* - replace the (few) lowerVectorShuffle*/matchVectorShuffle* cases to be consistent.
* [MC] Add parameter `Address` to MCInstrPrinter::printInstructionFangrui Song2020-01-064-4/+4
| | | | | | | | Follow-up of D72172. Reviewed By: jhenderson, rnk Differential Revision: https://reviews.llvm.org/D72180
* [MC] Add parameter `Address` to MCInstPrinter::printInstFangrui Song2020-01-064-9/+10
| | | | | | | | | | | | | | | | | | | | | | | | printInst prints a branch/call instruction as `b offset` (there are many variants on various targets) instead of `b address`. It is a convention to use address instead of offset in most external symbolizers/disassemblers. This difference makes `llvm-objdump -d` output unsatisfactory. Add `uint64_t Address` to printInst(), so that it can pass the argument to printInstruction(). `raw_ostream &OS` is moved to the last to be consistent with other print* methods. The next step is to pass `Address` to printInstruction() (generated by tablegen from the instruction set description). We can gradually migrate targets to print addresses instead of offsets. In any case, downstream projects which don't know `Address` can pass 0 as the argument. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D72172
* [X86] Move an enum definition into a header to simplify future patches [NFC]Philip Reames2020-01-062-24/+26
|
* [X86] Improve v4i32->v4f64 uint_to_fp for AVX1/AVX2 targets.Craig Topper2020-01-061-0/+15
| | | | | | Use zext+or+fsub to do the conversion. Similar to D71971. Differential Revision: https://reviews.llvm.org/D71971
* [X86] Fix an 8 bit testb being selected when folding a volatile i32 load ↵Amara Emerson2020-01-061-0/+11
| | | | | | pattern. Differential Revision: https://reviews.llvm.org/D71581
* Fix "use of uninitialized variable" static analyzer warnings. NFCI.Simon Pilgrim2020-01-061-0/+2
| | | | Add "unreachable" default cases like we do for the other switch()s in X86MCInstLower::Lower
* [CostModel][X86] Add missing scalar i64->f32 uitofp costsSimon Pilgrim2020-01-061-0/+4
|
* [NFC] Fix trivial typos in commentsJames Henderson2020-01-061-1/+1
| | | | | | | | Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D72143 Patch by Kazuaki Ishizaki.
* Add interface emitPrefix for MCCodeEmitterShengchen Kan2020-01-061-89/+133
| | | | Differential Revision: https://reviews.llvm.org/D72047
* [X86] Improve v2i64->v2f32 and v4i64->v4f32 uint_to_fp on avx and avx2 targets.Craig Topper2020-01-051-24/+125
| | | | | | | | | | | | | | | | | | | | | Summary: Based on Simon's D52965, but improved to handle strict fp and improve some of the shuffling. Rather than use v2i1/v4i1 and let type legalization continue, just generate all the code with legal types and use an explicit shuffle. I also added an explicit setcc to the v4i64 code to match the semantics of vselect which doesn't just use the sign bit. I'm also using a v4i64->v4i32 truncate instead of the shuffle in Simon's original code. With the setcc this will become a pack. Future work can look into using X86ISD::BLENDV and a different shuffle that only moves the sign bit. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71956
* [NFC] Modify the format:Liu, Chen32020-01-061-2/+1
| | | | Drop the else since we alerady returned in the if.
* [X86][SSE] Combine combineLogicBlendIntoConditionalNegate for VSELECT nodes ↵Simon Pilgrim2020-01-051-2/+13
| | | | | | | | (PR43660) Attempt to use combineLogicBlendIntoConditionalNegate for (select M, (sub 0, X), X) -> (sub (xor X, M), M) We limit this to cases that can't easily replace the VSELECT with a shuffle (non-constant masks) or where a BLENDV is likely to occur (which tends to result in slower codegen).
* [X86] Move combineLogicBlendIntoConditionalNegate before combineSelect. NFCI.Simon Pilgrim2020-01-051-62/+62
| | | | Updates function order in preparation of future fix for PR43660
* [X86] Merge (identical) LowerGC_TRANSITION_START and LowerGC_TRANSITION_END ↵Simon Pilgrim2020-01-052-27/+4
| | | | | | (NFC) Silences a copy+paste analyzer warning - all they are doing are inserting NOOPs in exactly the same way.
* GlobalISel: Add type argument to getRegBankFromRegClassMatt Arsenault2020-01-032-4/+5
| | | | | | AMDGPU can't unambiguously go back from the selected instruction register class to the register bank without knowing if this was used in a boolean context.
* [X86] Improve for v2i32->v2f64 uint_to_fpCraig Topper2020-01-031-36/+14
| | | | | | | | | | | | | | This uses an alternative implementation of this conversion derived from our v2i32->v2f32 handling. We can zero extend the v2i32 to v2i64, or it with the bit representation of 2.0^52 which will give us 2.0^52 plus the 32-bit integer since double's mantissa is 52 bits. Then we just need to subtract 2.0^52 as a double and let the floating point unit normalize the remaining bits into a valid double. This is less instructions then our previous code, but does require a port 5 shuffle for the zero extend or unpack. Differential Revision: https://reviews.llvm.org/D71945
* Move tail call disabling code to target independent codeReid Kleckner2020-01-031-7/+1
| | | | | | | | | | | | | | | | | When the "disable-tail-calls" attribute was added, checks were added for it in various backends. Now this code has proliferated, and it is something the target is responsible for checking. Move that responsibility back to the ISels (fast, global, and SD). There's no major functionality change, except for targets that never implemented this check. This LLVM attribute was originally added in d9699bc7bdf0362173fcd256690f61a4d47429c2 (2015). Reviewers: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D72118
* Fix typo "psuedo" in commentsJay Foad2020-01-031-1/+1
|
* [X86] Reorder X86any* PatFrags to put the strict node first so that chain ↵Craig Topper2020-01-034-10/+10
| | | | | | | | | | | property will be inferred for the instruction by the tablegen backend. Also use X86any_vfpround instead of X86vfpround in some instruction definitions so the strict version can be used to infer the chain property. Without these changes we don't propagate strict FP chain through isel for some instructions.
* [X86] Re-enable lowerUINT_TO_FP_vXi32 under fast-math by using an FSUB ↵Craig Topper2020-01-021-15/+9
| | | | | | | | | | | | | | | | | | | | | | | | instead of an FADD. Summary: We previously disabled this under fast math due to aggressive reassociation by the machine combiner. But I think we can work around this by using a FSUB instead of FADD for the first operation. This matches the similar algorithm we do for uint_to_fp i64->f64 in TargetLowering::expandUINT_TO_FP. If reassociation hasn't been a problem for that, hopefully its not a problem here. Reviewers: RKSimon, spatel, scanon Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71968
* [X86] Enable strict FP by default and remove option ↵Wang, Pengfei2020-01-031-0/+3
| | | | -disable-strictnode-mutation. NFCI.
* [X86] Optimization of inserting vxi1 sub vector into vXi1 vectorWang, Pengfei2020-01-031-2/+20
| | | | | | | | | | | | | | | | | Summary: After bugfix the undef value case here, we used more operations to implement inserting vxi1 sub vector into vXi1 vector, I optimize it by use less operations. The history information at https://reviews.llvm.org/D68311 Reviewers: craig.topper, LuoYuanke, yubing, annita.zhang, pengfei, LiuChen3, RKSimon Reviewed By: craig.topper Subscribers: hiraditya, llvm-commits Patch by Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D71917
* [X86] Move STRICT_ ISD nodes into the new section of X86ISelLowering.h where ↵Craig Topper2020-01-021-4/+17
| | | | STRICT nodes are collected after D71841
* X86: remove unused variableSaleem Abdulrasool2020-01-021-1/+0
| | | | | Remove the now unused-variable from aa17d31edb00c66461093b5a7cd2f4a35dc143e9. This breaks `-Werror` builds.
* [X86] Remove FP0-6 operands from call instructions in FPStackifier pass. ↵Craig Topper2020-01-021-9/+11
| | | | | | | | | | | | | | | Only count defs as returns. All FP0-6 operands should be removed by the FP stackifier. By removing these we fix the machine verifier error in PR39437. I've also made it so that only defs are counted for STReturns which removes what I think were extra stack cleanup instructions. And I've removed the regcall assert because it was checking the attributes of the caller, but here we're concerned with the attributes of the callee. But I don't know how to get that information from this level.
* [FPEnv] Default NoFPExcept SDNodeFlag to falseUlrich Weigand2020-01-021-6/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The NoFPExcept bit in SDNodeFlags currently defaults to true, unlike all other such flags. This is a problem, because it implies that all code that transforms SDNodes without copying flags can introduce a correctness bug, not just a missed optimization. This patch changes the default to false. This makes it necessary to move setting the (No)FPExcept flag for constrained intrinsics from the visitConstrainedIntrinsic routine to the generic visit routine at the place where the other flags are set, or else the intersectFlagsWith call would erase the NoFPExcept flag again. In order to avoid making non-strict FP code worse, whenever SelectionDAGISel::SelectCodeCommon matches on a set of orignal nodes none of which can raise FP exceptions, it will preserve this property on all results nodes generated, by setting the NoFPExcept flag on those result nodes that would otherwise be considered as raising an FP exception. To check whether or not an SD node should be considered as raising an FP exception, the following logic applies: - For machine nodes, check the mayRaiseFPException property of the underlying MI instruction - For regular nodes, check isStrictFPOpcode - For target nodes, check a newly introduced isTargetStrictFPOpcode The latter is implemented by reserving a range of target opcodes, similarly to how memory opcodes are identified. (Note that there a bit of a quirk in identifying target nodes that are both memory nodes and strict FP nodes. To simplify the logic, right now all target memory nodes are automatically also considered strict FP nodes -- this could be fixed by adding one more range.) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D71841
* [NFC] Make the type of X86AlignBranchBoundary compatibleShengchen Kan2020-01-021-1/+1
| | | | | | Change the type of X86AlignBranchBoundary from cl::opt<uint64_t> to cl::opt<unsigned> since the template class cl::opt is only instantiated with type unsigned, int, std::string, char and bool.
* [X86] Call SimplifyMultipleUseDemandedBits from combineVSelectToBLENDV if ↵Craig Topper2020-01-011-24/+42
| | | | | | | | the condition is used by something other than select conditions. We might be able to bypass some nodes on the condition path. Differential Revision: https://reviews.llvm.org/D71984
* add strict float for round operationLiu, Chen32020-01-016-41/+86
| | | | Differential Revision: https://reviews.llvm.org/D72026
* [X86] Fix typo in getCMovOpcode.Craig Topper2019-12-311-1/+1
| | | | | | The 64-bit HasMemoryOperand line was using CMOV32rm instead of CMOV64rm. Not sure how to test this. We have no test coverage that passes true for HasMemoryOperand.
* [X86] Add X87 FCMOV support to X86FlagsCopyLowering.Craig Topper2019-12-311-0/+73
| | | | Fixes PR44396
* [X86] Constant fold KSHIFT of an all zeros vector to just an all zeros vector.Craig Topper2019-12-311-0/+3
|
* [X86] Use carry flag from add for (seteq (add X, -1), -1).Craig Topper2019-12-311-10/+31
| | | | | | | | If we just subtracted 1 and are checking if the result is -1. We can use the carry flag from the ADD instead of an explicit CMP. I'm using the same checks for the add users as EmitTest. Fixes one case from PR44412 Differential Revision: https://reviews.llvm.org/D72019
* [X86] Slightly improve our attempted error recovery for 64-bit -mno-sse2 in ↵Craig Topper2019-12-311-2/+8
| | | | | | | | | | | | | | | LowerCallResult to use FP1 if there are two return values. If the return value is a struct of 2 doubles we need two return registers. If SSE2 is disabled we can't return in XMM registers like the ABI says. After logging an error we attempt to recover by using FP0 instead of an XMM register. But if the return needs two registers, we may have already used FP0. So if the register we were supposed to copy to is XMM1, copy to FP1 in the recovery instead. This seems to fix the assertion/crash in PR44413.
* [NFC] Style cleanupShengchen Kan2019-12-311-28/+29
| | | | | | 1. make function Is16BitMemOperand static 2. Use Doxygen features in comment 3. Rename functions to make them start with a lower case letter
* [NFC] Make X86MCCodeEmitter::isPCRel32Branch staticShengchen Kan2019-12-311-4/+2
|
* [NFC] Style cleanupShengchen Kan2019-12-311-389/+479
| | | | | | | 1. Remove function is64BitMode() and use STI.hasFeature(X86::Mode16Bit) directly 2. Use Doxygen features in comment 3. Rename functions to make them start with a lower case letter 4. Format the code with clang-format
* Remove a redundant `default:` on an exhaustive switch(enum).Eric Astor2019-12-301-2/+0
|
* [X86][AsmParser] re-introduce 'offset' operatorEric Astor2019-12-303-88/+158
| | | | | | | | | | | | | | | | | | | | | | | Summary: Amend MS offset operator implementation, to more closely fit with its MS counterpart: 1. InlineAsm: evaluate non-local source entities to their (address) location 2. Provide a mean with which one may acquire the address of an assembly label via MS syntax, rather than yielding a memory reference (i.e. "offset asm_label" and "$asm_label" should be synonymous 3. address PR32530 Based on http://llvm.org/D37461 Fix broken test where the break appears unrelated. - Set up appropriate memory-input rewrites for variable references. - Intel-dialect assembly printing now correctly handles addresses by adding "offset". - Pass offsets as immediate operands (using "r" constraint for offsets of locals). Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D71436
* [X86] Add X86ISD::PCMPGT to SimplifyMultipleUseDemandedBitsForTargetNode.Craig Topper2019-12-301-0/+7
| | | | | If only the sign bit is demanded, and the LHS is all zeroes, then we can bypass the PCMPGT.
* [X86] Use APInt::isOneValue and ConstantSDNode::isOne. NFCCraig Topper2019-12-291-4/+4
| | | | | These are implemented slightly more efficiently than comparing to 1 in the case that the value is more than 64 bits.
* [X86] Use isOneConstant to simplify some code. NFCCraig Topper2019-12-291-2/+1
|
* [X86] Remove dyn_casts to ConstantSDNode for operand 1 of ↵Craig Topper2019-12-291-108/+99
| | | | | | | | X86ISD::VSRLI/VSRAI/VSRLI. Use getConstantOperandVal and APInt operations. These nodes should only ever be formed with an i8 TargetConstant so we don't need to check for it to be a constant. It's also always 8-bits so we don't need to use APInt compare functions.
* [SelectionDAG] Disallow indirect "i" constraintFangrui Song2019-12-292-7/+1
| | | | | | | | | This allows us to delete InlineAsm::Constraint_i workarounds in SelectionDAGISel::SelectInlineAsmMemoryOperand overrides and TargetLowering::getInlineAsmMemConstraint overrides. They were introduced to X86 in r237517 to prevent crashes for constraints like "=*imr". They were later copied to other targets.
* [X86] Stop accidentally custom type legalizing v4i32->v4f32 on SSE1 only ↵Craig Topper2019-12-281-2/+3
| | | | | | | | | targets. We had a Custom operation action for v4i32 on SSE1. But since v4i32 isn't legal until SSE2 this was not what was intended. The code that get executed was intended for op legalization and creates a bunch of v4i32 nodes that all end up scalarized.
* [X86] Remove a redundant (scalar_to_vector (extract_vector_elt X))) in ↵Craig Topper2019-12-281-6/+1
| | | | LowerUINT_TO_FP_i32. NFCI
* [X86] Fix -enable-machine-outliner for x86-32 after D48683Fangrui Song2019-12-281-3/+1
| | | | D48683 accidentally disabled -enable-machine-outliner for x86-32.
OpenPOWER on IntegriCloud