summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Remove dead code from X86DAGToDAGISel::Select that is no longer needed ↵Craig Topper2020-01-111-28/+0
| | | | now that we don't mutate strict fp nodes. NFC
* [X86] Simplify code by removing an unreachable condition. NFCICraig Topper2020-01-101-12/+2
| | | | | | For X87<->SSE conversions, the SSE type is always smaller than the X87 type. So we can always use the smallest type for the memory type.
* [X86] Preserve fpexcept property when turning strict_fp_extend and ↵Craig Topper2020-01-102-4/+37
| | | | | | | | | | | strict_fp_round into stack operations. We use the stack for X87 fp_round and for moving from SSE f32/f64 to X87 f64/f80. Or from X87 f64/f80 to SSE f32/f64. Note for the SSE<->X87 conversions the conversion always happens in the X87 domain. The load/store ops in the X87 instructions are able to signal exceptions.
* [X86][Disassembler] Simplify readPrefixesFangrui Song2020-01-101-43/+25
|
* [X86] Use ReplaceAllUsesWith instead of ReplaceAllUsesOfValueWith to ↵Craig Topper2020-01-101-12/+2
| | | | simplify some code. NFCI
* [X86] Support function attribute "patchable-function-entry"Fangrui Song2020-01-101-3/+15
| | | | | | | For x86-64, we diverge from GCC -fpatchable-function-entry in that we emit multi-byte NOPs. Differential Revision: https://reviews.llvm.org/D72220
* [X86][AVX] lowerShuffleAsLanePermuteAndShuffle - consistently normalize ↵Simon Pilgrim2020-01-101-2/+2
| | | | | | | | multi-input shuffle elements We only use lowerShuffleAsLanePermuteAndShuffle for unary shuffles at the moment, but we should consistently handle lane index calculations for multiple inputs in both the AVX1 and AVX2 paths. Minor (almost NFC) tidyup as I'm hoping to use lowerShuffleAsLanePermuteAndShuffle for binary shuffles soon.
* [NFC] Style cleanupShengchen Kan2020-01-101-9/+10
|
* CodeGen: Use LLT instead of EVT in getRegisterByNameMatt Arsenault2020-01-092-2/+2
| | | | | | Only PPC seems to be using it, and only checks some simple cases and doesn't distinguish between FP. Just switch to using LLT to simplify use from GlobalISel.
* [ms] [X86] Use "P" modifier on all branch-target operands in inline X86 ↵Eric Astor2020-01-094-50/+31
| | | | | | | | | | | | | | | | | | | assembly. Summary: Extend D71677 to apply to all branch-target operands, rather than special-casing call instructions. Also add a regression test for llvm.org/PR44272, since this finishes fixing it. Reviewers: thakis, rnk Reviewed By: thakis Subscribers: merge_guards_bot, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72417
* [X86] AMD Znver2 (Rome) Scheduler enablementGanesh Gopalasubramanian2020-01-103-2/+1551
| | | | | | | | | | | | The patch gives out the details of the znver2 scheduler model. There are few improvements with respect to execution units, latencies and throughput when compared with znver1. The tests that were present for znver1 for llvm-mca tool were replicated. The latencies, execution units, timeline and throughput information are updated for znver2. Reviewers: craig.topper, Simon Pilgrim Differential Revision: https://reviews.llvm.org/D66088
* [X86] Remove EFLAGS from live-in lists in X86FlagsCopyLowering.Jonas Paulsson2020-01-081-0/+3
| | | | | | | | | | | When EFLAGS is no longer live into a basic block, remove it from the live-in list. Fixes https://bugs.llvm.org/show_bug.cgi?id=44462. Review: Craig Topper Differential Revision: https://reviews.llvm.org/D71375
* [X86] Keep cl::opts at top of file [NFC]Philip Reames2020-01-081-34/+34
|
* [X86] Custom type legalize v4i64->v4f32 uint_to_fp on sse4.1 targets in ↵Craig Topper2020-01-081-9/+11
| | | | | | | | | | 64-bit mode For v4i64->v4f32 uint_to_fp on pre-avx targets where v4i64 isn't legal we create to v2i64->v2f32 uint_to_fp that need to be shuffled together. Our codegen for v2i64->v2f32 involves detecting if the number is larger than (2^31 - 1), if so we do a special divison by 2 so we can do a signed conversion which we need to scalarize, then do a multiply by 2 at the end if we divided earlier. When v4i64 isn't legal we need to split the checking for a larger number and dividing by 2 into two v2i64 vectors. The scalar part can extract the 4 i64 values from those 4 splits. But we can reassemble the 4 scalar f32 results directly into a single v432 vector. Then we just need to combine the fixup indications from the 2 halves and we can do the final multiply by 2 fixup on all 4 values if needed at once using a single v4f32 blend and v4f32 fadd. Differential Revision: https://reviews.llvm.org/D72368
* [X86] Add isel patterns for bitcasting between v32i1/v64i1 and float/double.Craig Topper2020-01-081-0/+11
| | | | | | We have to do an intermediate jump to a GPR to make the cast. Fixes PR43750.
* [BranchAlign] Compiler support for suppressing branch alignPhilip Reames2020-01-082-2/+50
| | | | | | | | | | | | As discussed heavily in the original review (D70157), there's a need for the compiler to be able to selective suppress padding (either nop or prefix) to respect assumptions about the meaning of labels and instructions in generated code. Rather than wait for syntax to be finalized - which appears to be a very slow process - this patch focuses on the compiler use case and *only* worries about the integrated assembler. To my knowledge, this covers all cases mentioned to date for clang/JIT support. For testing purposes, I wired it up so that if the integrated assembler was using autopadding for branch alignment (e.g. enabled at command line) then the textual assembly output would contain a comment for each location where padding was enabled or disabled. This seemed like the least painful choice overall. Note that the result of this patch effective disables the jcc errata mitigation for many constructs (statepoints, implicit null checks, xray, etc...) which is non ideal. It is at least *correct* and should allow us to enable the mitigation for the compiler. Once that's done, and a few other items are worked through, we probably want to come back to this an explore a bundling based approach instead so that we can pad instructions while keeping labels in the right place. Differential Revision: https://reviews.llvm.org/D72303
* [X86] Adding fp128 support for strict fcmpWang, Pengfei2020-01-081-5/+5
| | | | | | | | | | | | Summary: Adding fp128 support for strict fcmp Reviewers: craig.topper, LiuChen3, andrew.w.kaylor, RKSimon, uweigand Subscribers: hiraditya, llvm-commits, LuoYuanke Tags: #llvm Differential Revision: https://reviews.llvm.org/D71897
* [X86] Enable v2i64->v2f32 uint_to_fp code in ReplaceNodeResults on SSE4.1 targetCraig Topper2020-01-071-3/+1
| | | | | | Now that we generate decent code for (v2i64 (setlt zero, X)) on pre-sse4.2 targets I think we can use this now. Differential Revision: https://reviews.llvm.org/D72354
* [X86] Improve lowering of (v2i64 (setgt X, -1)) on pre-SSE2 targets. Enable ↵Craig Topper2020-01-071-3/+14
| | | | | | | | v2i64 in foldVectorXorShiftIntoCmp. Similar to D72302 but for the canonical form for the opposite case. I've changed foldVectorXorShiftIntoCmp to form a target independent setcc node instead of PCMPGT now and enabled its for v2i64 on pre-SSE4.2 targets. The setcc should eventually get lowered to PCMPGT or the new v2i64 sequence. Differential Revision: https://reviews.llvm.org/D72318
* [X86] Improve lowering of v2i64 sign bit tests on pre-sse4.2 targetsCraig Topper2020-01-071-0/+13
| | | | | | Without sse4.2 a v2i64 setlt needs to expand into a pcmpgtd, pcmpeqd, 3 shuffles, and 2 logic ops. But if we're only interested in the sign bit of the i64 elements, we can just use one pcmpgtd and shuffle the odd elements to the even elements. Differential Revision: https://reviews.llvm.org/D72302
* [X86] Pull out repeated SrcVT.getVectorNumElements() call. NFCI.Simon Pilgrim2020-01-071-2/+2
|
* [X86] Standardize shuffle match/lowering function names. NFC.Simon Pilgrim2020-01-071-38/+39
| | | | We mainly use lowerShuffle*/matchShuffle* - replace the (few) lowerVectorShuffle*/matchVectorShuffle* cases to be consistent.
* [MC] Add parameter `Address` to MCInstrPrinter::printInstructionFangrui Song2020-01-064-4/+4
| | | | | | | | Follow-up of D72172. Reviewed By: jhenderson, rnk Differential Revision: https://reviews.llvm.org/D72180
* [MC] Add parameter `Address` to MCInstPrinter::printInstFangrui Song2020-01-064-9/+10
| | | | | | | | | | | | | | | | | | | | | | | | printInst prints a branch/call instruction as `b offset` (there are many variants on various targets) instead of `b address`. It is a convention to use address instead of offset in most external symbolizers/disassemblers. This difference makes `llvm-objdump -d` output unsatisfactory. Add `uint64_t Address` to printInst(), so that it can pass the argument to printInstruction(). `raw_ostream &OS` is moved to the last to be consistent with other print* methods. The next step is to pass `Address` to printInstruction() (generated by tablegen from the instruction set description). We can gradually migrate targets to print addresses instead of offsets. In any case, downstream projects which don't know `Address` can pass 0 as the argument. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D72172
* [X86] Move an enum definition into a header to simplify future patches [NFC]Philip Reames2020-01-062-24/+26
|
* [X86] Improve v4i32->v4f64 uint_to_fp for AVX1/AVX2 targets.Craig Topper2020-01-061-0/+15
| | | | | | Use zext+or+fsub to do the conversion. Similar to D71971. Differential Revision: https://reviews.llvm.org/D71971
* [X86] Fix an 8 bit testb being selected when folding a volatile i32 load ↵Amara Emerson2020-01-061-0/+11
| | | | | | pattern. Differential Revision: https://reviews.llvm.org/D71581
* Fix "use of uninitialized variable" static analyzer warnings. NFCI.Simon Pilgrim2020-01-061-0/+2
| | | | Add "unreachable" default cases like we do for the other switch()s in X86MCInstLower::Lower
* [CostModel][X86] Add missing scalar i64->f32 uitofp costsSimon Pilgrim2020-01-061-0/+4
|
* [NFC] Fix trivial typos in commentsJames Henderson2020-01-061-1/+1
| | | | | | | | Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D72143 Patch by Kazuaki Ishizaki.
* Add interface emitPrefix for MCCodeEmitterShengchen Kan2020-01-061-89/+133
| | | | Differential Revision: https://reviews.llvm.org/D72047
* [X86] Improve v2i64->v2f32 and v4i64->v4f32 uint_to_fp on avx and avx2 targets.Craig Topper2020-01-051-24/+125
| | | | | | | | | | | | | | | | | | | | | Summary: Based on Simon's D52965, but improved to handle strict fp and improve some of the shuffling. Rather than use v2i1/v4i1 and let type legalization continue, just generate all the code with legal types and use an explicit shuffle. I also added an explicit setcc to the v4i64 code to match the semantics of vselect which doesn't just use the sign bit. I'm also using a v4i64->v4i32 truncate instead of the shuffle in Simon's original code. With the setcc this will become a pack. Future work can look into using X86ISD::BLENDV and a different shuffle that only moves the sign bit. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71956
* [NFC] Modify the format:Liu, Chen32020-01-061-2/+1
| | | | Drop the else since we alerady returned in the if.
* [X86][SSE] Combine combineLogicBlendIntoConditionalNegate for VSELECT nodes ↵Simon Pilgrim2020-01-051-2/+13
| | | | | | | | (PR43660) Attempt to use combineLogicBlendIntoConditionalNegate for (select M, (sub 0, X), X) -> (sub (xor X, M), M) We limit this to cases that can't easily replace the VSELECT with a shuffle (non-constant masks) or where a BLENDV is likely to occur (which tends to result in slower codegen).
* [X86] Move combineLogicBlendIntoConditionalNegate before combineSelect. NFCI.Simon Pilgrim2020-01-051-62/+62
| | | | Updates function order in preparation of future fix for PR43660
* [X86] Merge (identical) LowerGC_TRANSITION_START and LowerGC_TRANSITION_END ↵Simon Pilgrim2020-01-052-27/+4
| | | | | | (NFC) Silences a copy+paste analyzer warning - all they are doing are inserting NOOPs in exactly the same way.
* GlobalISel: Add type argument to getRegBankFromRegClassMatt Arsenault2020-01-032-4/+5
| | | | | | AMDGPU can't unambiguously go back from the selected instruction register class to the register bank without knowing if this was used in a boolean context.
* [X86] Improve for v2i32->v2f64 uint_to_fpCraig Topper2020-01-031-36/+14
| | | | | | | | | | | | | | This uses an alternative implementation of this conversion derived from our v2i32->v2f32 handling. We can zero extend the v2i32 to v2i64, or it with the bit representation of 2.0^52 which will give us 2.0^52 plus the 32-bit integer since double's mantissa is 52 bits. Then we just need to subtract 2.0^52 as a double and let the floating point unit normalize the remaining bits into a valid double. This is less instructions then our previous code, but does require a port 5 shuffle for the zero extend or unpack. Differential Revision: https://reviews.llvm.org/D71945
* Move tail call disabling code to target independent codeReid Kleckner2020-01-031-7/+1
| | | | | | | | | | | | | | | | | When the "disable-tail-calls" attribute was added, checks were added for it in various backends. Now this code has proliferated, and it is something the target is responsible for checking. Move that responsibility back to the ISels (fast, global, and SD). There's no major functionality change, except for targets that never implemented this check. This LLVM attribute was originally added in d9699bc7bdf0362173fcd256690f61a4d47429c2 (2015). Reviewers: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D72118
* Fix typo "psuedo" in commentsJay Foad2020-01-031-1/+1
|
* [X86] Reorder X86any* PatFrags to put the strict node first so that chain ↵Craig Topper2020-01-034-10/+10
| | | | | | | | | | | property will be inferred for the instruction by the tablegen backend. Also use X86any_vfpround instead of X86vfpround in some instruction definitions so the strict version can be used to infer the chain property. Without these changes we don't propagate strict FP chain through isel for some instructions.
* [X86] Re-enable lowerUINT_TO_FP_vXi32 under fast-math by using an FSUB ↵Craig Topper2020-01-021-15/+9
| | | | | | | | | | | | | | | | | | | | | | | | instead of an FADD. Summary: We previously disabled this under fast math due to aggressive reassociation by the machine combiner. But I think we can work around this by using a FSUB instead of FADD for the first operation. This matches the similar algorithm we do for uint_to_fp i64->f64 in TargetLowering::expandUINT_TO_FP. If reassociation hasn't been a problem for that, hopefully its not a problem here. Reviewers: RKSimon, spatel, scanon Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71968
* [X86] Enable strict FP by default and remove option ↵Wang, Pengfei2020-01-031-0/+3
| | | | -disable-strictnode-mutation. NFCI.
* [X86] Optimization of inserting vxi1 sub vector into vXi1 vectorWang, Pengfei2020-01-031-2/+20
| | | | | | | | | | | | | | | | | Summary: After bugfix the undef value case here, we used more operations to implement inserting vxi1 sub vector into vXi1 vector, I optimize it by use less operations. The history information at https://reviews.llvm.org/D68311 Reviewers: craig.topper, LuoYuanke, yubing, annita.zhang, pengfei, LiuChen3, RKSimon Reviewed By: craig.topper Subscribers: hiraditya, llvm-commits Patch by Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D71917
* [X86] Move STRICT_ ISD nodes into the new section of X86ISelLowering.h where ↵Craig Topper2020-01-021-4/+17
| | | | STRICT nodes are collected after D71841
* X86: remove unused variableSaleem Abdulrasool2020-01-021-1/+0
| | | | | Remove the now unused-variable from aa17d31edb00c66461093b5a7cd2f4a35dc143e9. This breaks `-Werror` builds.
* [X86] Remove FP0-6 operands from call instructions in FPStackifier pass. ↵Craig Topper2020-01-021-9/+11
| | | | | | | | | | | | | | | Only count defs as returns. All FP0-6 operands should be removed by the FP stackifier. By removing these we fix the machine verifier error in PR39437. I've also made it so that only defs are counted for STReturns which removes what I think were extra stack cleanup instructions. And I've removed the regcall assert because it was checking the attributes of the caller, but here we're concerned with the attributes of the callee. But I don't know how to get that information from this level.
* [FPEnv] Default NoFPExcept SDNodeFlag to falseUlrich Weigand2020-01-021-6/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The NoFPExcept bit in SDNodeFlags currently defaults to true, unlike all other such flags. This is a problem, because it implies that all code that transforms SDNodes without copying flags can introduce a correctness bug, not just a missed optimization. This patch changes the default to false. This makes it necessary to move setting the (No)FPExcept flag for constrained intrinsics from the visitConstrainedIntrinsic routine to the generic visit routine at the place where the other flags are set, or else the intersectFlagsWith call would erase the NoFPExcept flag again. In order to avoid making non-strict FP code worse, whenever SelectionDAGISel::SelectCodeCommon matches on a set of orignal nodes none of which can raise FP exceptions, it will preserve this property on all results nodes generated, by setting the NoFPExcept flag on those result nodes that would otherwise be considered as raising an FP exception. To check whether or not an SD node should be considered as raising an FP exception, the following logic applies: - For machine nodes, check the mayRaiseFPException property of the underlying MI instruction - For regular nodes, check isStrictFPOpcode - For target nodes, check a newly introduced isTargetStrictFPOpcode The latter is implemented by reserving a range of target opcodes, similarly to how memory opcodes are identified. (Note that there a bit of a quirk in identifying target nodes that are both memory nodes and strict FP nodes. To simplify the logic, right now all target memory nodes are automatically also considered strict FP nodes -- this could be fixed by adding one more range.) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D71841
* [NFC] Make the type of X86AlignBranchBoundary compatibleShengchen Kan2020-01-021-1/+1
| | | | | | Change the type of X86AlignBranchBoundary from cl::opt<uint64_t> to cl::opt<unsigned> since the template class cl::opt is only instantiated with type unsigned, int, std::string, char and bool.
* [X86] Call SimplifyMultipleUseDemandedBits from combineVSelectToBLENDV if ↵Craig Topper2020-01-011-24/+42
| | | | | | | | the condition is used by something other than select conditions. We might be able to bypass some nodes on the condition path. Differential Revision: https://reviews.llvm.org/D71984
OpenPOWER on IntegriCloud